24

Let's use Traveling Salesman as the example, unless you think there's a simpler, more understable example.

My understanding of P=NP question is that, given the optimal solution of a difficult problem, it's easy to check the answer, but very difficult to find the solution.

With the Traveling Salesman, given the shortest route, it's just as hard to determine it's the shortest route, because you have to calculate every route to ensure that solution is optimal.

That doesn't make sense. So what am I missing? I imagine lots of other people encounter a similar error in their understanding as they learn about this.

Tom Mercer
  • 351
  • 2
  • 6
  • 7
    An easier example for NP-complete problem would be Sudoku.

    It's trivially easy to check if a sudoku is correct. It is not trivial to figure out the solution though.

    – Joe Feb 11 '20 at 13:05
  • 13
    @TomMercer Sudoku is NP-Complete if the grids are arbitrarily sized - I would imagine that if your recursive Sudoku solver had to do the numbers 1-16 with 16 rows and 16 columns in 4x4 boxes (or 1-25 in 5x5 boxes), the time it takes to solve would increase far more than the amount of time to check the solution would. – TheHans255 Feb 11 '20 at 18:31
  • 3
    @TomMercer: The generalized sudoku problem is the graph colouring problem. Imagine that each number in the puzzle is a node in a graph of a certain colour, and the edges are added such that each node is connected to all the nodes that cannot be the same number/colour. The sudoku problem then is "given a partial colouring of the graph, determine all the colours of all the nodes". This problem is famously difficult in general; you are right that for the special case of newspaper-quality sudoku puzzles it is pretty easy. – Eric Lippert Feb 11 '20 at 22:16
  • 2
    Of course newspaper-quality sudoku puzzles are designed so that they can be solved quickly by human brains with neurons that work at the speed of a horse-drawn carriage and can do several math operations per minute, so it should not be surprising that throwing them at hardware that does billions of operations per second where the speed of electron propagation is the limiting factor does not cause difficulties. Consider the problem of building a sudoku puzzle that is hard for a computer to solve quickly; can you get any insight into what those puzzles look like? – Eric Lippert Feb 11 '20 at 22:22
  • You ask in your first sentence for a more understandable example. The canonical example is "satisfaction". That is, given a set of quantities x1, x2, x3 ... all of which are either true and false, and a set of statements using those quantities combined with NOT, AND and OR, is there an assignment of values to each quantity that makes every statement true? That is, that "satisfies" each statement. Plainly if you have the values that work then you can quickly certify that the answer is yes, but it is not clear how to find the values. – Eric Lippert Feb 11 '20 at 22:25
  • What makes TSP NP-complete is the fact that every TSP problem has an equivalent SAT problem, and vice-versa. That is, if there is a fast solution to SAT then there also is for TSP, and similarly vice-versa. Seeing the connection between these seemingly disparate problems is not easy for the beginner. – Eric Lippert Feb 11 '20 at 22:26
  • @EricLippert: Strictly speaking, your argument only shows that the generalized sudoku problem is equivalent to a special case of a problem that's difficult in general. (Though as it happens, the generalized sudoku problem is known to be NP-complete.) – ruakh Feb 12 '20 at 05:53
  • You keep saying tsp to mean traveling salesman problem so I figured you would work out from context that sat is similarly the Boolean satisfaction problem that I discussed in my earlier comment. Evidently I was wrong and apologize for the confusion. Sat is the Boolean satisfaction problem. – Eric Lippert Feb 12 '20 at 06:03
  • Noted. I was not intending to give an exegesis of the details of the graph colouring problem in my comment, but rather to illustrate that the sudoku problem is not as easy as it might seem. – Eric Lippert Feb 12 '20 at 06:06
  • 1
    @TomMercer I think a misunderstand you might have in terms of difficulty is the factor time and growth. Sure it's reasonable "easy" and "fast" for us (or a computer) to solve a 3x3 sudoku. But what actually makes a problem be in NP is the rate at which the difficulty grows when increasing the size of the problem. That difference is what sets P apart from NP. For Sudoku, the growth is (after googling quickly) = O ( N^(N^2) ), which means the complexity grows to the power of itself. Polynomial Time would be something like O ( n^2 + n ), where the growth is "fixed" (as the powers are constant) – Joe Feb 12 '20 at 09:55
  • That person has the same misunderstanding, though they asked the question a little differently. The answer is that it's vital for CS/complexity types to explicitly say "decision" and not just call it the TSP. The TSP is an optimization problem, not a decision problem. That's the root of my and that asker's misunderstanding. – Tom Mercer Feb 13 '20 at 17:03

5 Answers5

47

Your version of the TSP is actually NP-hard, exactly for the reasons you state. It is hard to check that it is the correct solution. The version of the TSP that is NP-complete is the decision version of the problem (quoting Wikipedia):

The decision version of the TSP (where given a length L, the task is to decide whether the graph has a tour of at most L) belongs to the class of NP-complete problems.

In other words, instead of asking "What is the shortest possible route through the TSP graph?", we're asking "Is there a route through the TSP graph that fits within my budget?".

TheHans255
  • 403
  • 3
  • 6
D.R
  • 604
  • 7
  • 10
  • So can NP-complete not possible to check? Where's the breakdown in my understanding? My understanding of NP-complete problems is that you can check the solutions easily (P), but you can't do the solution easily (NP). – Tom Mercer Feb 11 '20 at 07:07
  • 5
    NP-complete is a subset of NP; you are correct in your understanding of the definition of NP. Thus the decision version of the problem is in fact NP. The one where you ask for the minimum possible path is NOT NP. – D.R Feb 11 '20 at 07:35
  • 4
    Also note that "checking" does not always just mean "here's an answer, go check it." When one is checking a NP-complete answer, one is typically given extra information derived from the solving of the problem. My favorite example is proving that a number is prime. One can provide a list of "witness" integers which, together, can prove that a number is prime. Finding the correct list of numbers is very difficult, but once you have the list, the verification is trivial. – Cort Ammon Feb 11 '20 at 15:00
  • Thanks. The gap in my understanding is made clear on that wikipedia. The TSP is neither P nor NP. It's not a CS/complexity problem. The misunderstanding comes from the muddling of CS/complexity people talking about the "decision version" of the TSP as if it's the TSP, which it is not. It's a related setup, but a totally different problem. Yes, they have applications/setup/conversions in common, but they are totally different problems. The "decision version" of TSP is not TSP at all. Now it makes sense. – Tom Mercer Feb 11 '20 at 19:01
  • 1
    @CortAmmon As an example this works, but don't forget to state that PRIMES is in P. – David Tonhofer Feb 12 '20 at 11:43
  • @TomMercer It is a "CS/complexity problem", but not a decision problem. There is a whole separate zoo of complexity classes for optimization problems (search for NPO, APX, PTAS). These classes are in a different space from the decision problems, so it wouldn't make sense to ask if NP is a subset of NPO, for example. TSP as you know it is NPO-complete, while the "decision version" is in NP (and this is a general feature of NPO problems). – Mario Carneiro Feb 13 '20 at 00:13
  • 1
    If the decision problem is solvable in polytime, then TSP the optimization problem must be solvable in polytime as well right? As you can just iterate through the "questions" down until you get a NO, and then you know the optimal answer was the previous one. EDIT: You can do better than iterate down, you can binary search it, but it's still polytime * polytime which is polytime in any case – Cruncher Feb 13 '20 at 14:55
  • I had to convince myself that what I just said is true and that the paths don't get too long such that iterating down could actually take longer than polynomial relative to N. But it is true, because there's some longest edge, so the longest possible path is bounded by N*longest_edge, and the shortest possible path is bounded by longest_edge – Cruncher Feb 13 '20 at 15:03
  • 2
    This answer just mixes up decision problems with function problems (or more specifically, with optimization problems). A function problem cannot be "NP hard", at least not with the traditional definition of polynomial time reducibility for decision problems which refers to set membership. This answer however fits well in a web tradition of treating of function problems without much rigor. – Jirka Hanika Feb 13 '20 at 16:08
  • @Cruncher This works, but you have to use binary search. The weights are encoded as binary numbers, so you have to be polynomial in log(longest_edge) for polytime as well. – mlk Feb 13 '20 at 20:01
  • @mlk I'm not convinced the weights being encoded as binary numbers is particularly relevant. As the binary length of N is log(n), it can't ever have an effect of the P-ness of a problem – Cruncher Feb 13 '20 at 20:07
  • 1
    @Cruncher Turn it around. The value of a number of length n is in O(2^n). – Taemyr Feb 13 '20 at 22:26
  • The first sentence of this answer is incorrect, and the second is misleading. The OP's version of the TSP is not NP-hard, because NP-hard is a class of decision problems and the OP's version of the TSP is not a decision problem. Also, the first two sentences together imply that NP-hard is the set of problems for which there are no efficient checks on the solutions. That's not true; the latter set is in fact the complement of NP. NP-complete problems are NP-hard but still have efficient solutions. – tparker Feb 14 '20 at 03:11
  • If you define a problem to be "hard" if there are no efficient checks on its solutions (as your second sentence would imply), then if $P \neq NP$ then NP-hard and the set of hard problems are incomparable sets, and if $P = NP$ then NP-hard is a strict subset of the set of all hard problems. – tparker Feb 14 '20 at 03:15
15

There is a lot of decent answers here but none clear up a couple fairly important misunderstandings you seem to have.

Both P and NP are classes of what are called "decision problems." These are problems whose answer is YES or NO. (More formally they are all questions of given a string and a language, is the string in the language but that isn't an important distinction). In this sense, you are slightly incorrect in your understanding when you say "given the optimal solution of a difficult problem, it's easy to check the answer, but very difficult to find the solution" because decision problems don't have "optimal solutions." Problems where solutions can be "evaluated" and you are looking for the "best" solution are optimization problems, of which The Travelling Salesman Problem is an example. You can always turn an optimization problem into a decision problem by considering the problem "Given an instance of this optimization problem and an integer k, does the problem have a solution whose objective value is better than k?".

Another thing is you might be slightly confused as to what NP means. P is the class of decision problems that can be solved in Polynomial Time (that you seem to understand). NP stands for "Non-deterministic Polynomial Time" and it is the class of problems that you can easily check if an instance of the problem should give a YES answer given some extra information. So looking at our TSP problem, if I have an instance of TSP, and a solution whose total cost is less than k, then I can easily check that the solution is really a solution and that its cost is less than k. So the decision problem associated with TSP is in NP. But not all problems in NP are "hard". Actually P is a subset of NP because if you can easily solve the decision problem, you can easily check if an instance gives you a YES answer by just solving it.

But there are some problems in NP we think are hard to solve. Oversimplifying a little, we call these NP-complete problems. (Note these still must be decision problems). We can say a problem A is at least as hard as problem B if, we assume we have a blackbox oracle that solves problem A and we can use it to efficiently solve problem B. Let's again consider the TSP example. Clearly, if you could solve the optimization problem (that is get the optimal solution) then you could solve the decision problem. So the optimization problem is at least as hard as its corresponding decision problem. If we showed that the decision problem version of TSP was NP-complete (which it is) then we would know that the optimization problem TSP is also as hard as NP-complete problems, but it itself is not actually NP-complete because it isn't a decision problem. We call such problems NP-hard.

NaturalLogZ
  • 596
  • 2
  • 9
  • 1
    Can you illustrate P vs NP with an actual TSP example? I'm not getting it... – Tom Mercer Feb 11 '20 at 18:09
  • I am not sure what you mean by "illustrate P vs NP with an actual TSP example". P and NP are classes of problems and the decision problem version of TSP is one of the problems in NP. Looking at a specific instance of TSP wouldn't really clarify anything about what it means for a problem to be in P or NP.

    But to be more clear, a classic decision problem is satisfiability. An instance is a propositional logic statement, and it is a YES instance if there is some assignment of the variables that makes the statement evaluate to TRUE.

    – NaturalLogZ Feb 11 '20 at 18:26
  • Like, here are 4 cities, with some numbers for distances between them, here's the exact YES/NO thing you have to find. Here's what's NP about it. Here's what's P about it. But with actual TSP example. So, for example, "Is there a route between Chicago, Dallas, NYC, and SF that's less than or equal to 2,737 miles?" And which part of that is P and which part is NP. So I can grasp precisely where my understanding is lacking. – Tom Mercer Feb 11 '20 at 18:29
  • Okay so let's look at this example you created. An example is what is called an "instance" of TSP. Now if there is a route that visits those 4 cities, ending where you started, that is at most 2,737 miles, we would call it a YES instance of TSP. Furthermore, you could tell me the route and I could easily verify that the route gets me to each of the 4 cities, ending where I started (the solution is correct) and I could check that the total distance is at most 2,737 miles. Thus I can easily check that this is a YES instance. Since you could do this with any YES instance we say TSP is in NP – NaturalLogZ Feb 11 '20 at 18:35
  • I'm beginning to piece together the gap in my understanding. I know what the TSP is. It is NOT a yes/no question. The NP version of a problem related to TSP is this is there a visit-all route with length < k?. TSP means find the shortest visit-all route.It is neither P nor NP in the computer sciencey way of defining these ideas. But this TSP-related NP problem is about a specific route and a specific length k. – Tom Mercer Feb 11 '20 at 18:56
  • "The NP version of a problem related to TSP is this is there a visit-all route with length < k?. TSP means find the shortest visit-all route. It is neither P nor NP in the computer sciencey way of defining these ideas." Correct.

    "But this TSP-related NP problem is about a specific route and a specific length k." No. The NP problem doesn't have to do with a specific route. Being in NP means having the ability to certify a given example is a YES instance (and the way you would certify a TSP instance is a YES instance is to give a specific route with small enough distance).

    – NaturalLogZ Feb 11 '20 at 18:58
  • The NP problem has nothing? – Tom Mercer Feb 11 '20 at 18:58
6

$P$ and $NP$ are classes of decision problems. The result of an algorithm for a decision problem is either "YES" or "NO". Even for a problem in $P$, such an answer cannot lead to a quick verification.

An instance of the decision problem version of TSP is "Given a collection of cities and intercity distances, is there a tour with total length less than $k$?", where $k$ is a constant specified in the instance. The result is "YES" or "NO". In neither case does the answer lead to a quick verification of the correctness of the answer.

The promise that you ask about is this: Given a particular proposed tour, one can in polynomial time:

  • Determine that the proposed tour actually is a tour -- visits all the cities and only traverses intercity routes that exist (sometimes "that have finite distances" when one encodes missing routes as having length $\infty$).
  • If so, determine that the length of the route is shorter than the constant $k$ in the problem instance.

Neither an answer of "YES" or "NO" provides a proposed tour.

The value of the model of $NP$ that you are using is that it encodes a way to make a solver: for each possible tour (typically an exponentially large set to iterate over) check to see if it is a tour and if its length is $< k$. If so, report "YES". If we exhaust the collection of possible tours without reporting "YES", report "NO".

Note that this model suggests that the the difficulty in fast solution is not that checking the conditions takes a lot of time. The difficulty in fast solution is that there are too many potential tours to search through. So, if we could find some really, really smart way to restrict our search to only a tiny subset the collection of potential tours, we would have a fast solution for an $NP$ problem.

Binary search in a sorted list is an example where one has a smart way to search through the list evaluating only logarithmically many (in the length of the list) comparisons rather than linearly many comparisons. From this point of view, the TSP problem is hard because we don't know a substantially faster way to search through the proposed tours of every possible TSP problem instance.

Eric Towers
  • 310
  • 1
  • 6
  • OK, so the crucial gap in my understanding is that TSP is NOT NP at all. TSP is to find the shortest visit-all route. The NP problem (somewhat related to TSP, but not actually the TSP) is is a visit-all route < k possible. – Tom Mercer Feb 11 '20 at 18:53
  • @TomMercer : No. The decision problem asks "Is there a tour of length less then $k$?". An algorithm is allowed to use whatever method works, which may have nothing to do with finding the shortest all-visit route. Just because you understand an algorithm for solving a problem does not mean another algorithm uses a similar method. The optimal algorithm may do something that seems completely alien to you. – Eric Towers Feb 11 '20 at 18:56
  • The decision problem is not the traveling salesman problem (TSP). The TSP is, by definition, what's the shortest visit-all route? It's an old optimization problem. CS has unfortunately mislabeled an adjacent problem TSP. – Tom Mercer Feb 11 '20 at 18:58
  • @TomMercer : If you announce that you are discussing $P$ and $NP$ then you announce that you are discussing the decision problem. If you had announced you were discussing $NP$-hard, then you would be discussing the optimization problem. – Eric Towers Feb 11 '20 at 18:59
  • Ah, ok. The optimization problem is called the TSP. The decision problem should not be called TSP, as this leads to a lot of misunderstanding like I just experienced. If we called it "the TSP decision problem" and made sure to always separate the decision problem from the TSP, that would make learning better for many people. – Tom Mercer Feb 11 '20 at 19:06
  • @TomMercer it isn't that CS "mislabeled an adjacent problem TSP." The problems are basically the same and the distinction is just about formality and technicality. It makes sense for both to referred to as "TSP" and which one we mean exactly is clear from context. Though of course when you are first familiarizing yourself with the formality and technicalities it is useful to be more explicit in the distinction, which is why I have been referring to one as "the decision problem version of TSP." – NaturalLogZ Feb 11 '20 at 19:06
  • P NP NP-hard NP-complete, it's really not clear from context -- hence my confusion. If a person on the street says TSP, they're referring to an optimization problem. It's the original TSP. It got named TSP. The decision form is a very different problem. – Tom Mercer Feb 11 '20 at 19:16
  • @TomMercer : If a person on the street says "TSP" they are far more likely to be discussing a Telecommunications Service Provider or a Thrift Savings Plan than computational complexity theory. Pretending that we need microscopic disambiguation flies in the face of the overwhelming evidence that we do not. Disambiguation is provided by context. What context did you establish? – Eric Towers Feb 11 '20 at 19:19
  • 1
    You can reduce an optimization problem to logarithmic number of decision problems, assuming you can find an upper bound on the optimal solution, via binary search. Let that upper bound be K. Is there a solution <=K? Yes. Is there a solution <= K/2? No. Is there a solution <= 3K/4? ... In polynomial time, you'll find the smallest value K for which the decision problem yields Yes. – chepner Feb 12 '20 at 20:26
  • 1
    Since that implies the optimization problem can be solved in polynomial time if the decision problem can be, you can get away with being imprecise. – chepner Feb 12 '20 at 20:31
1

NP is all about decision problems - problems where the answer is "yes" or "no".

A problem is in NP if for every instance where the answer is "yes", there is a hint that let's you easily prove that the answer is "yes". It doesn't say anything about instances where the answer is "no". They can be hard to solve.

The classical Travelling Salesman problem is: Given a set of cities and their distances, is it possible to find a tour shorter than k? And quite obviously, if the answer is yes then such a tour exists, and we can use it as a hint to easily show the answer is yes. If the answer is no, then nobody has yet come up with any hint that would let you prove that.

You stated a problem that you also called "Travelling Salesman" problem, but it is actually different. You ask: Given a set of cities and their distances and a tour, is that tour the shortest tour? In this case, if the answer is "no" then there is a shorter tour, and we can use it as a hint to easily show the answer is "no". That's exactly the opposite of NP: Your alternative version of the Travelling Salesman problem is one where for every instance where the answer is "no", there is a hint that lets you easily prove the answer is "no". Because it is the exact opposite of NP, this class is called "co-NP".

There are many problems like that. For every problem in NP, you could ask the question: "Is the answer for this instance of the problem 'no'", and of course the answer is exactly the opposite of the original problem. You just made the mistake of thinking that every problem with the words "travelling" and "salesman" in it is the same problem.

gnasher729
  • 29,996
  • 34
  • 54
  • 1
    You are wrong about the "classic" (not classical) TSP. The original TSP is not a decision problem. It is an optimization problem: what's the shortest round-trip, visit-all route? Stop personalizing it as "my" problem. I'm not "calling" it the TSP. It is the original TSP. – Tom Mercer Feb 13 '20 at 02:45
  • The optimization and decision versions of the TSP are often simply referred to as TSP. When asking questions about P=NP, most people are going to assume you are talking about the decision version, as this is what makes the most sense in this context (not saying that it doesn't make sense to also talk about the optimization version, or the second decision version given by gnasher729 as they are all deeply linked). gnasher729 is not "wrong" here, he is correcting a probable misconception of yours about what problem people refer to when saying things like "TSP is NP-complete". – Tassle Feb 13 '20 at 17:17
0

I find it most easy to understand by using the 3-SAT NP-complete problem:

There are $n$ boolean variables and you can decide for each of them either to be set the $true$ or $false$ value and you are given $k$ clauses. Each of the clauses contains 3 variables and the constraints to them, like $(true OR false OR true)$, so the clauese would be satisfied if the first variable was set to true OR the second variable to false OR the third variable to true. The $k$ klauses can contain all possible combinations of three of the $n$ variables and you have to decide what value every variable should be set to, so that all clauses are satisfied.

enter image description here

If you find a combination of values for all variables, so that every clause is satisfied, your combination can be vermied very easy by just going once throuegh every clause and test it, but it can be very hard to find a combination which satisfies every clause.

Eugen
  • 141
  • 7