CS 15-451 Analysis of Algorithms 6 November 2003 Manuel & Avrim Blum Today: definition and theory of NP-completeness... continued. The theory of NP-completeness is concerned with decision problems. A decision problem is a computational problem defined by a pair (INSTANCE, QUESTION). Here, INSTANCE describes what constitutes an acceptable instance of the problem. QUESTION is a question about the instance. An INSTANCE is a YES-Instance if the correct answer to the QUESTION is YES. Otherwise it is a NO-Instance. Recall the formal definitions of P and NP: * P is the class of decision problems solvable by polynomial time algorithm. * NP is a class of decision problems. A decision problem PI is in NP iff there exists a polynomial "poly" and a proof system -- a system of logic that includes a poly time algorithmic checker for checking proofs -- such that: 1. for every YES-Instance of length n, there exists a poly(n) long string proving that the answer is YES; while 2. for every NO-Instance, no string (of any length) proves that the answer is YES. Informally, a decision problem is in NP iff it has a Nifty (polynomial length) Proof when the answer is YES. It is in CoNP iff it has a Nifty Proof when the answer is NO. P is contained in both NP and CoNP. There exist problems like INTEGER FACTORIZATION which are in both NP and CoNP but which are not known to be in P. Theorem: Let PI be any decision problem in P. Then PI is in both NP and CoNP. Proof: A poly-time algorithm may be viewed as a system for generating polynomial length proofs of YES and NO. QED Here are some examples of computational problems. See if you can tell which are in NP, which are in CoNP, and which in P: *VERTEX COVER (VC) SEARCH/OPTIMIZATION PROBLEM: INPUT: an undirected weighted graph G. OUTPUT: a collection C of vertices that covers all the edges. This means that every edge has at least one endpoint in the cover. (Not a decision problem. Therefore the question whether or not this problem is in NP makes no sense.) *VC DECISION PROBLEM: INSTANCE: an undirected graph G, and a positive integer k, k < n. QUESTION: Does G have a VC having at most k vertices? (NP) *A NONSTANDARD VC PROBLEM: INSTANCE: an undirected graph G, a positive integer k < n, and a collection C of vertices of G. QUESTION: Is C a vertex cover having at most k vertices? (Not only in NP. In fact in P) *ANOTHER NONSTANDARD VC DECISION PROBLEM: INSTANCE: an undirected graph G, a vertex cover C of G QUESTION: Is C a VC of minimum size? (Co-NP: there is a nifty proof when the answer is NO.) *ANOTHER NONSTANDARD VC DECISION PROBLEM: INSTANCE: an undirected graph G, a vertex cover C of G QUESTION: Does G have a vertex cover that is smaller than C? (In NP) Then give more examples. * a proof system for 3-colorability is an assignment of R,W,B, to each vertex with the property that 1. for every 3-colorable graph, there is an assignment of one of the three colors to each vertex such that no two adjacent vertices (vertices joined by an edge) get the same color; while 2. if the graph is NOT 3-colorable, then there is no such coloring. ie for every assignment of R, W, B to the vertices, some pair of adjacent vertices will be colored the same. * a proof system for Hamilton Cycle Define the 3SAT problem: INSTANCE: a set of m clauses on n variables. Each clause contains 3 literals. Each literal is either a variable or the complement of a variable. QUESTION: is there an assignment of truth values to the variables such that each clause contains a true literal? EXAMPLE: {X, Y, Z'} {X, Z, W'} {X', Y, W} {Y', Z, W'} (This is satisfiable under the assignment that maps every variable to T.) EXAMPLE of a 2SAT problem: {X', Y} {Y', Z} {Z', X} {X, Y} {X', Z'} (This is NOT satisfiable. Why not?) DEFINITION: A decision problem PI (given by INSTANCE and QUESTION) is NP-COMPLETE iff 1. PI is in NP. 2. PI is complete for all problems in NP. This means that there is a poly-time function f from instances of 3SAT to instances of PI that maps YES-instances of 3SAT to YES-instances of PI and NO-instances of 3SAT to NO-instances of PI. The existence of such a poly-time function f: 3SAT -> PI implies that any poly-time algorithm for PI can be turned into a poly- time algorithm for 3SAT: To solve an instance I of 3SAT, just map I into the corresponding instance f(I) of PI. Then use the poly-time algorithm for PI to get the answer for f(I). That answer to f(I) is the desired answer to I. Such a function f is said to REDUCE 3SAT to PI. This is because the problem of solving 3SAT has been reduced to the problem of solving PI. THEOREM: The CLIQUE problem is NP-complete. CLIQUE PROBLEM: INSTANCE: a graph G = (V,E), and a positive integer k. QUESTION: does G have a clique of size at least k? A CLIQUE is a collection of vertices such that every pair of vertices is joined by an edge. PROOF: An Instance of 3SAT is defined by giving n variables and m clauses, each clause containing 3 literals, each literal being either a variable or its complement. The function f must map each such Instance of 3SAT into an instance of CLIQUE, that is, a pair (G, k), where G = (V,E) is an undirected graph and k is a positive integer. This function f must be answer-preserving: it must map YES-Instances to YES-Instances and NO-Instances to NO-Instances. In addition, f must run in polynomial time. This latter condition on running time is what prevents the function f from solving the Instance of 3SAT. No poly-time algorithm is known for 3SAT. (It is likely that none exists.) So f must map an instance of 3SAT for which it does not know the answer to an instance of CLIQUE for which it does not know the answer -- and still be answer-preserving. An important idea for the construction of our particular f is to arrange that the graph G in the image of f have all possible n(n-1)/2 edges *except* for a few that we specifically leave out. A missing edge between two vertices u,v ensures that at most one of these vertices can be in any clique. We define f as follows: f maps the n 3SAT variables X1, ..., Xn to n pairs of *nonadjacent* vertices (x1, x1'), ..., (xn, xn'). f maps each clause, {X, Y, Z'} for example, to a triple of *nonadjacent* vertices (x, y, z'). All edges that have not been specifically forbidden are inserted into the graph. This construction ensures that a maximum size clique can have at most n+m vertices, one from each of the n pairs (x1, x1'), ..., (xn, xn'). and one from each of the m triples, such as (x, y, z'). In fact, if the clauses can be satisfied, then such a clique of size m+n can be constructed, and not otherwise, so set k = m+n. To complete the proof, we must show 3 things: 1. f maps YES-Instances to YES-Instances. 2. f maps NO-Instances to NO-Instances. 3. f is poly-time. You should be able to do this yourself! QED THEOREM: The VERTEX COVER problem is NP-complete. VERTEX COVER PROBLEM: INSTANCE: an undirected graph G, and a positive integer k. QUESTION: Does G have a VC having at most k vertices? We now have two ways to go about proving this theorem: we can either reduce 3SAT directly to VC, or we can reduce CLIQUE to VC: Suppose we construct a g that reduces CLIQUE to VC, just as the above f reduced 3SAT to CLIQUE. Then the function gf will be a reduction of 3SAT to VC: gf(3SAT Instance) = g(CLIQUE Instance) = VC Instance NP-completeness is about EXPRESSIVENESS. A reduction of 3SAT to PI is a proof that PI is as expressive as 3SAT -- as capable as 3SAT in describing computational problems. But how expressive is 3SAT? There is another problem that is at least as expressive as 3SAT, and that is: BOOLEAN FORMULA SATISFIABILITY INSTANCE: a boolean formula (such as ( ((X -> Y) or (Y -> Z)) and X'Z' ) QUESTION: is the boolean formula satisfiable? In other words, is there a truth assignment to the variables of the boolean formula that make the formula true? In the above example, the assignment of T to each of X,Y,Z is not satisfying. However, there are many satisfying assignments including any assignment that maps X -> F. BOOLEAN FORMULA SATISFIABILITY is clearly NP-Complete. (Why? Because it is in NP (Why?), and because an instance of 3SAT maps directly to an instance of BOOLEAN FORMULA SATISFIABILITY as in the following example: {X, Y, Z'} {X', Y, Z} -> (X + Y + Z')(X' + Y + Z))