CS 15-451 Analysis of Algorithms 9 November 2004 Manuel & Avrim Blum Today: Definition and Theory of NP-completeness... third lecture. DEFINITION: A decision problem PI (given by INSTANCE and QUESTION) is NP-COMPLETE iff 1. PI is in NP. 2. PI is complete for all problems in NP. This means that for every problem PI' in NP there is a poly-time function f from instances of PI' to instances of PI that maps YES-instances of PI' to YES-instances of PI and NO-instances of PI' to NO-instances of PI. For example, suppose there exists a poly-time function f: 3SAT -> PI that maps YES-instances to YES-instances, and NO-instances to NO-instances. Existence of f implies that any poly-time algorithm for PI can be turned into a poly-time algorithm for 3SAT: To solve an instance x of 3SAT, just map x into the corresponding instance f(x) of PI. Then use the poly-time algorithm for PI to get the answer for f(x). That answer to f(x) is the desired answer to x. (Why?) Such a function f is said to REDUCE 3SAT to PI. This is because f reduces the problem of solving 3SAT to the problem of solving PI. Cook's Theorem: 3SAT is NP-COMPLETE. In what follows, we assume the truth of Cook's Theorem, that 3SAT is NP-complete. That makes it easier to prove that a problem PI in NP is NP-complete. To do so, we only have to show that 1. PI is in NP, and 2. 3SAT reduces to PI. Here is an example of how to do this (also due to Cook): CLIQUE PROBLEM: INSTANCE: a graph G = (V,E), and a positive integer k. QUESTION: does G have a clique of size at least k? A CLIQUE is a collection of vertices such that every pair of vertices is joined by an edge. THEOREM: The CLIQUE problem is NP-complete. PROOF: 1. CLIQUE is in NP. (Why?) 2. 3SAT is poly-time reducible to CLIQUE: An instance of 3SAT is defined by giving n variables and m clauses, each clause containing 3 literals, each literal being either a variable or its complement. The function f must map each such instance of 3SAT into an instance of CLIQUE: a pair (G, k), where G = (V,E) is an undirected graph and k is a positive integer. This function f must be answer-preserving: it must map YES-instances to YES-instances and NO-instances to NO-instances. In addition, f must run in polynomial time. This latter condition on running time is what prevents the function f from solving the instance of 3SAT. No poly-time algorithm is known for 3SAT. (It is possible that none exists.) So f must map an instance of 3SAT for which it does not know the answer to an instance of CLIQUE for which it does not know the answer -- yet still be answer-preserving. An important idea for the construction of our particular f is to arrange that the graph G in the image of f have all possible n(n-1)/2 edges *except* for a few that we specifically leave out. A missing edge between two vertices u,v ensures that at most one of these vertices can be in any clique. We define f as follows: f maps the n 3SAT variables X1, ..., Xn to n pairs of *nonadjacent* vertices (x1, x1'), ..., (xn, xn'). f maps each clause such as {X, Y, Z'} to a triple of *nonadjacent* vertices (x, y, z'). A vertex labelled x, say, and another labelled x', its complement, must be *nonadjacent*. All edges that have not been specifically forbidden are inserted into the graph. This construction ensures that a clique can have no more than n+m vertices, one from each of the n pairs (x1, x1'), ..., (xn, xn'). and one from each of the m triples, such as (x, y, z'). In fact, if the clauses can be satisfied, then such a clique of size m+n can be constructed, and not otherwise, so set k = m+n. To complete the proof, we must show 3 things: 1. f maps YES-instances to YES-instances. 2. f maps NO-instances to NO-instances. 3. f is poly-time. You should be able to do this yourself! QED Here is another example, this one due to Karp: VERTEX COVER PROBLEM: INSTANCE: an undirected graph G, and a positive integer k. QUESTION: Does G have a VC (a set of vertices "covering" all the edges) that has at most k vertices? THEOREM: The VERTEX COVER problem is NP-complete. We now have two ways to go about proving this theorem: we can either reduce 3SAT directly to VC, or we can reduce CLIQUE to VC: Suppose we construct a poly-time function g that reduces CLIQUE to VC, just as the above poly-time f reduced 3SAT to CLIQUE. Then the composition of g and f, gf, will be a poly-time reduction of 3SAT to VC: gf(3SAT instance) = g(CLIQUE instance) = VC instance NP-completeness is about EXPRESSIVENESS. A reduction of 3SAT to PI is a proof that PI is as expressive as 3SAT -- as capable as 3SAT at describing computational problems. But how expressive is 3SAT? There is another problem that is at least as expressive as 3SAT, and that is: BOOLEAN FORMULA SATISFIABILITY INSTANCE: a boolean formula (such as ( ((X -> Y) or (Y -> Z)) and X'Z' ) QUESTION: is the boolean formula satisfiable? In other words, is there a truth assignment to the variables of the boolean formula that make the formula true? In the above example, the assignment of T to each of X,Y,Z is not satisfying. However, there are many satisfying assignments including any assignment that maps X -> F. BOOLEAN FORMULA SATISFIABILITY is clearly NP-Complete. (Why? Because it is in NP (Why?), and because an instance of 3SAT maps directly to an instance of BOOLEAN FORMULA SATISFIABILITY as in the following example: {X, Y, Z'} {X', Y, Z} -> (X + Y + Z')(X' + Y + Z))