Earlier this week I had a discussion about the running time of an algorithm used to test a method. This got me thinking about one of my favourite areas of computer science – computational complexity. We can use complexity theory to split problems into classes. Understanding some of these classes can help us to recognise when we should seek an exact solution and when it would be more appropriate to seek an approximate solution.
Commonly we consider a decision problem; a problem which can be answered with a “yes” or a “no”. This may seem restrictive at first but problems that have more general answers can often be rephrased.
In order to compare the complexity of decision problems, we need to be able to classify them. There are many different classes, but several common ones are: P, NP, NP-Complete (NPC) and NP-Hard (NPH).
P is the set of all decision problems which can be solved in polynomial time. That is, as the size of the problem increases, the time it takes to solve the problem increases with a rate proportional to some power of the size. The definition of size varies from problem to problem. As an example, if we are to sort an array of strings into alphabetical order, then the size of the array defines the size of the problem.
The next class is NP. Decision problems that lie in NP can have their solution verified in polynomial time. Note that this definition places no restriction on how long it takes to solve the problem. This class therefore includes the decision problems whose solutions can be found using algorithms that see an exponential growth in running time as the size of the problem increases.
For a problem, p, to lie in the NPC class, it must satisfy two conditions: p must belong to NP and we must be able to transform any problem in NP into an instance of p.
A problem, p, is NPH if there is an NPC problem that can be transformed into p. By definition, all NPC problems are NPH as all NPC problems can be transformed into instances of one another. However, it should be noted that this class is not restricted to just decision problems. This paper from last year shows that various Nintendo games sit within NPH!
The Problem with Hard Problems
There is a big question in computer science as to whether the set of problems in P is equal to the set of problems in NP. It is widely assumed that these two classes are not equal. This has important consequences as NP problems are not bound by the time it takes to solve them.
As shown earlier, NPH problems are not necessarily decision problems. This means that an NPH problem could be hiding right in front of you. Why does this matter? Because all NPH problems are at least as hard as the problems in NPC. Without realising it, you could be writing an algorithm whose running time is doomed to grow exponentially.
You might think that it would be obvious if the running time grows exponentially or that it would be picked up during testing. That depends on whether your test data is large enough. Exponential growth curves start off shallow. If your test data doesn’t reach the boundary at which computation time starts to become intractable, you might not notice. That is until you hit large, live data.
Spotting Hard Problems
So how do we spot whether a problem is hard? We check whether we can transform an NP-Complete problem into an instance of it. Of course, this is often easier said than done. In reality there isn’t time to flick through a catalogue of hundreds of problem definitions to see whether the problem at hand is similar.
Having said that, a knowledge of at least a few NP-Complete problems can be useful. If you’re working on a problem and your algorithm seems to be taking a while to complete, take a check against some of the well-known NPC problems. If you are able to find a mapping between one of the NPC problems and your problem, then your problem is hard!
Conversely, if you can transform your problem to a problem in P, then it’s likely there is an existing algorithm to solve it. We can also use problems in P as sanity checks that the problem being solved won’t blow up once we push to live. You are probably doing this subconsciously already, as many common algorithms run in polynomial time.
Working with Hard Problems
If you recognise that your problem is hard, you should stop seeking an exact solution. It’s unlikely that an instance of useful size will be solvable in a short time.
Heuristics have been developed to find approximate solutions to many NPC problems. Read around, one of them may be accurate enough for your use case. Take, for example, the game of battleships. Here is a proof of its NP-Completeness and here is a great article by DataGenetics that demonstrates a heuristic for approximating the solution. The simplex method is another well known algorithm which can be used on problems that can be expressed as a linear program.
Another approach would be to simplify your problem so that it is no longer in NPC. This may be achievable by restricting the types of input or by parameterising the problem in some way. Alternatively, it may be possible to split your problem into several smaller problems which can be solved more quickly – divide and conquer! These approaches may mean sacrificing some of the information in the solution. The consequences of modifying the problem space should always be considered.
Problems come in all shapes and sizes. Being able to categorise problems can help to identify whether the task should be to solve the problem exactly or to seek a reasonable approximation.
“Computers and Intractability: A Guide to the Theory of NP-completeness“ is as good a starting point as any for learning about NPC problems. The book by Johnson and Garey lists a number of NPC problems and dives into more detail than I have space for in this post.
When you next sit down to solve a problem, take a moment to ask: Is this problem solvable? Otherwise, you could be starting a battle you cannot win!