CS 15-451 Analysis of Algorithms 4 November 2004 Manuel & Avrim Blum Today: Definition and Theory of NP-completeness... continued. The theory of NP-completeness is concerned with decision problems. A decision problem is a computational problem defined by a pair (INSTANCE, QUESTION). Here, INSTANCE describes what constitutes an acceptable instance of the problem. QUESTION is a question about the instance. An INSTANCE is a YES-instance if the correct answer to the QUESTION is YES. Otherwise (the correct answer is NO), it is a NO-instance. Here are the formal definitions of P, NP, and CoNP: * P is the class of decision problems solvable by efficient (polynomial time computable) algorithms. * NP is a class of decision problems: A decision problem PI is in NP iff there exists a polynomial "poly" and a 0-1 valued poly-time computable function ff(x,y) such that: 1. for every YES-instance x of length n, there exists a string y of length at most poly(n) for which ff(x,y) = 1, while 2. for every NO-instance x, for any and all strings y (of any length), ff(x,y) = 0. (Note the asymmetry between 1. and 2.) * Co-NP is a class of decision problems: A decision problem PI is in Co-NP iffthere exists a polynomial "poly" and a 0-1 valued poly-time computable function ff(x,y) such that: 1. for every NO-instance x of length n, there exists a string y of length at most poly(n) for which ff(x,y) = 1, while 2. for every YES-instance x, for any and all strings y (of any length), ff(x,y) = 0. The function ff defines a proof system. In the NP proof system, y is the so-called proof that x is a YES-instance. If ff(x,y) = 1, then y is a valid (correct) proof that x is a YES-instance. If ff(x,y) = 0, then y is an invalid proof that x is a YES-instance. In this case, x may or may not be a YES-instance; if it is, y is not a valid proof. A Logician might define NP in an equivalent way as follows: * NP is a class of decision problems. A decision problem PI is in NP iff there exists a polynomial "poly" and a proof system -- a system of logic that includes a poly time algorithmic checker for checking proofs -- such that: 1. for every YES-instance of length n, there exists a poly(n) long string proving that the answer is YES; while 2. for every NO-instance, no string (of any length) proves that the answer is YES. Informally, a decision problem is in NP iff it has a polynomial length Proof when the answer is YES. It is in CoNP iff it has a polynomial length Proof when the answer is NO. NP stands for "Nondeterministic Polynomial." (It could just as well stand for "Nifty Proof" where "Nifty" means "Polynomial Length.") The following theorem asserts that P is contained in both NP and CoNP: Theorem: Let PI be any decision problem in P. Then PI is in both NP and CoNP. Proof: A poly-time algorithm may be viewed as a system for generating polynomial length proofs of YES and NO. QED There exist problems like INTEGER FACTORIZATION which are in both NP and CoNP but which are not known to be in P. Here are some examples of computational problems. See if you can tell which are in NP, which are in CoNP, and which in P: *VERTEX COVER (VC) SEARCH/OPTIMIZATION PROBLEM: INPUT: an undirected weighted graph G. OUTPUT: a collection C of vertices that covers all the edges. This means that every edge has at least one endpoint in the cover. (Not a decision problem. Therefore the question whether or not this problem is in NP makes no sense.) *VC DECISION PROBLEM: INSTANCE: an undirected graph G, and a positive integer k with k < n. QUESTION: Does G have a VC having at most k vertices? (NP) *A NONSTANDARD VC PROBLEM: INSTANCE: an undirected graph G, a positive integer k < n, and a collection C of vertices of G. QUESTION: Is C a vertex cover having at most k vertices? (Not only is this problem in NP, it is in fact in P.) *ANOTHER NONSTANDARD VC DECISION PROBLEM: INSTANCE: an undirected graph G, a vertex cover C of G QUESTION: Is C a VC of minimum size? (Co-NP: there is a polynomial-length proof when the answer is NO.) *ANOTHER NONSTANDARD VC DECISION PROBLEM: INSTANCE: an undirected graph G, a vertex cover C of G QUESTION: Does G have a vertex cover that is smaller than C? (In NP) We now switch to the SAT problem. This problem is historically important on account of its role in the theory of NP-completeness. It has become practicallyimportant as well: PROBLEM: 3SAT INSTANCE: a set of m clauses on n variables. Each clause contains 3 literals. Each literal is either a variable or the complement of a variable. QUESTION: is there an assignment of truth values to the variables such that each clause contains a true literal? EXAMPLE: {X, Y, Z'} {X, Z, W'} {X', Y, W} {Y', Z, W'} (This is satisfiable under the assignment that maps every variable to T.) EXAMPLE of a 2SAT problem: {X', Y} {Y', Z} {Z', X} {X, Y} {X', Z'} (This is NOT satisfiable. Why not?) DEFINITION: A decision problem PI (given by INSTANCE and QUESTION) is NP-COMPLETE iff 1. PI is in NP. 2. PI is complete for all problems in NP. This means that for every problem PI' in NP there is a poly-time function f from instances of PI' to instances of PI that maps YES-instances of PI' to YES-instances of PI and NO-instances of PI' to NO-instances of PI. For example, suppose PI' = 3SAT. The existence of (such) a poly-time function f: 3SAT -> PI that maps YES-instances to YES-instances, and NO-instances to NO-instances implies that any poly-time algorithm for PI can be turned into a poly- time algorithm for 3SAT: To solve an instance x of 3SAT, just map x into the corresponding instance f(x) of PI. Then use the poly-time algorithm for PI to get the answer for f(x). That answer to f(x) is the desired answer to x. (Why?) Such a function f is said to REDUCE 3SAT to PI. This is because f reduces the problem of solving 3SAT to the problem of solving PI. Cook's Theorem: 3SAT is NP-COMPLETE. This is a difficult (but not impossibly difficult) fundamental theorem. You will see the proof of it in both graduate algorithms 15-750 and graduate complexity 15-855 courses. For now, let's just notice something strange about this theorem. It says that EVERY problem in NP is poly-time reducible to 3SAT. Every problem in NP? How can one show this for every single problem in NP? One can see how to show that a particular problem (like graph 3-colorability) reduces to 3SAT (Show this!), but how in the world does one show that EVERY problem in NP reduces to 3SAT? How does one even get a hold of "every problem in NP"? The key idea is that every problem in NP is completely defined by its corresponding poly-time computable 0-1 valued function ff(x,y). In what follows, we assume the truth of Cook's Theorem, that 3SAT is NP-complete. That makes it easier to prove that a problem PI in NP is NP-complete. To do so, we only have to show that 1. PI is in NP, and 2. 3SAT reduces to PI. Here is an example (also due to Cook): CLIQUE PROBLEM: INSTANCE: a graph G = (V,E), and a positive integer k. QUESTION: does G have a clique of size at least k? A CLIQUE is a collection of vertices such that every pair of vertices is joined by an edge. THEOREM: The CLIQUE problem is NP-complete. PROOF: 1. CLIQUE is in NP. (Why?) 2. 3SAT is poly-time reducible to CLIQUE: An instance of 3SAT is defined by giving n variables and m clauses, each clause containing 3 literals, each literal being either a variable or its complement. The function f must map each such instance of 3SAT into an instance of CLIQUE: a pair (G, k), where G = (V,E) is an undirected graph and k is a positive integer. This function f must be answer-preserving: it must map YES-instances to YES-instances and NO-instances to NO-instances. In addition, f must run in polynomial time. This latter condition on running time is what prevents the function f from solving the instance of 3SAT. No poly-time algorithm is known for 3SAT. (It is likely that none exists.) So f must map an instance of 3SAT for which it does not know the answer to an instance of CLIQUE for which it does not know the answer -- and still be answer-preserving. An important idea for the construction of our particular f is to arrange that the graph G in the image of f have all possible n(n-1)/2 edges *except* for a few that we specifically leave out. A missing edge between two vertices u,v ensures that at most one of these vertices can be in any clique. We define f as follows: f maps the n 3SAT variables X1, ..., Xn to n pairs of *nonadjacent* vertices (x1, x1'), ..., (xn, xn'). f maps each clause, {X, Y, Z'} for example, to a triple of *nonadjacent* vertices (x, y, z'). All edges that have not been specifically forbidden are inserted into the graph. This construction ensures that a maximum size clique can have at most n+m vertices, one from each of the n pairs (x1, x1'), ..., (xn, xn'). and one from each of the m triples, such as (x, y, z'). In fact, if the clauses can be satisfied, then such a clique of size m+n can be constructed, and not otherwise, so set k = m+n. To complete the proof, we must show 3 things: 1. f maps YES-instances to YES-instances. 2. f maps NO-instances to NO-instances. 3. f is poly-time. You should be able to do this yourself! QED Here is another example, this one due to Karp: VERTEX COVER PROBLEM: INSTANCE: an undirected graph G, and a positive integer k. QUESTION: Does G have a VC having at most k vertices? THEOREM: The VERTEX COVER problem is NP-complete. We now have two ways to go about proving this theorem: we can either reduce 3SAT directly to VC, or we can reduce CLIQUE to VC: Suppose we construct a g that reduces CLIQUE to VC, just as the above f reduced 3SAT to CLIQUE. Then the composition of g and f, gf, will be a reduction of 3SAT to VC: gf(3SAT instance) = g(CLIQUE instance) = VC instance NP-completeness is about EXPRESSIVENESS. A reduction of 3SAT to PI is a proof that PI is as expressive as 3SAT -- as capable as 3SAT in describing computational problems. But how expressive is 3SAT? There is another problem that is at least as expressive as 3SAT, and that is: BOOLEAN FORMULA SATISFIABILITY INSTANCE: a boolean formula (such as ( ((X -> Y) or (Y -> Z)) and X'Z' ) QUESTION: is the boolean formula satisfiable? In other words, is there a truth assignment to the variables of the boolean formula that make the formula true? In the above example, the assignment of T to each of X,Y,Z is not satisfying. However, there are many satisfying assignments including any assignment that maps X -> F. BOOLEAN FORMULA SATISFIABILITY is clearly NP-Complete. (Why? Because it is in NP (Why?), and because an instance of 3SAT maps directly to an instance of BOOLEAN FORMULA SATISFIABILITY as in the following example: {X, Y, Z'} {X', Y, Z} -> (X + Y + Z')(X' + Y + Z))