15-451 Algorithms 03/03/07 NP-completeness cont * Recap, Formal definitions. * reducing CIRCUIT-SAT to 3-SAT [proving 3-SAT is NP-complete] * reducing 3-SAT to CLIQUE [proving CLIQUE is NP-complete] * Independent Set, Vertex Cover ======================================================================= In the discussion below, a "problem" means something like 3-coloring or network flow, and an "instance" means a specific instance of that problem --- the graph to color or the network to find a flow on. A DECISION PROBLEM is just a problem where each instance is either a YES-instance or a NO-instance and the goal is to decide which type your instance is. (e.g., for 3-coloring, G is a YES-instance if it has a 3-coloring and NO if not. For the perfect-matching problem, G is a YES-instance if it has one and a NO-instance if it does not.) P: class of decision problems Q that have polynomial-time algorithms: A(I)=YES iff I is a YES-instance of Q. NP: decision problems where at least the YES-instances have short proofs (that can be checked in polynomial-time) that the answer is YES. Q is in NP if there is a verifier V(I,X) such that: If "I" is a YES-instance, then there exists X such that V(I,X) = YES. If "I" is a NO-instance, then for all X, V(I,X) = NO. and furthermore the length of X and the running time of V are poly in |I|. co-NP: vice-versa: there are short proofs for NO-instances. (E.g., given two circuits, C_1, C_2, do they compute the same function?). Problem Q is NP-complete if: (1) Q is in NP, and (2) Any other Q' in NP is "polynomial-time reducible" to Q. In particular, for any Q' in NP, there is some function f from instances of Q' to instances of Q, that maps YES-instances of Q' to YES-instances of Q, and maps NO-instances of Q' to NO-instances of Q. If Q just satisfies (2) then it's called NP-hard. Last time we showed that the following problem is NP-complete: CIRCUIT-SAT: Given a circuit of NAND gates with a single output and no loops (some of the inputs may be hardwired). Question: is there a setting of the inputs that causes the circuit to output 1? Proof sketch: It's clearly in NP, since V(I,X) just runs circuit I on proposed solution X and checks that the circuit outputs 1. Now, say Q' is some other problem in NP. Q' must have some polynomial-time verifier V'. Now, given some instance I of Q', what function f does is construct a circuit C_I such that C_I(X) = V'(I,X). So, f is like a compiler that compiles V' into a circuit of NAND gates. Claim is that we can do this by unrolling a polynomial number of loops of the kind of NAND-gate computer you built in 15-251. Finally, by design, C_I is a YES-instance of CIRCUIT-SAT iff I was a YES-instance of Q'. So, if we can solve CIRCUIT-SAT then we can solve Q'. Aside: we could define the SEARCH-version of a problem in NP as: "..and furthermore, if I is a YES-instance, then *produce* X such that V(I,X)=YES". If we can solve any NP-complete decision problem in polynomial time then we can actually solve search-version of any problem in NP in polynomial-time too. Will talk about in recitation. Now CIRCUIT-SAT is a little unweildy. What's REALLY INTERESTING about NP-completeness is not just that such problems exist, but that a lot of very innocuous-looking problems are NP-complete. To show results like that, we will first reduce CIRCUIT-SAT to a much simpler-looking problem called 3-SAT. 3-SAT: Given: a CNF formula (AND of ORs) over n variables x1,...,xn, where each clause has at most 3 variables in it. (x1 OR x2 OR not(x3)) AND (not(x2) OR x3) AND (x1 OR x3) AND ... Goal: find an assignment to the variables that satisfies the formula if one exists. THEOREM: 3-SAT is NP-Complete Proof: We need to define a function f that converts instances C of CIRCUIT-SAT to instances of 3-SAT such that the formula produced is satisfiable iff the circuit C had an input x such that C(x)=1. First of all, let's assume our input is given as a list of gates, where for each gate we are told what its inputs are connected to. E.g., g1 = x1 NAND x3; g2 = g1 NAND x4; g3 = x1 NAND 1; g4 = g1 NAND g2; ... plus we are told which gate gm is the output of the circuit. We will now compile this into an instance of 3-SAT as follows. We'll make one variable for each input xi of the circuit, and one for every gate gi. We now write each NAND as a conjunction of 4 clauses. In particular, we just replace the statement "y3 = y1 NAND y2" with: (y1 OR y2 OR y3) // if y1=0 and y2=0 then we must have y3=1 AND (y1 OR not(y2) OR y3) // if y1=0 and y2=1 then we must have y3=1 AND (not(y1) OR y2 OR y3) // if y1=1 and y2=0 then we must have y3=1 AND (not(y1) OR not(y2) OR not(y3)) //if y1=1,y2=1 we must have y3=0 Finally, we add the clause (gm). This forces the circuit to output 1. In other words, we are asking: is there an input and a setting of all the gates such that the output of the circuit is equal to 1, and each gate is doing what it's supposed to? So, the 3-CNF formula produced is satisfiable if and only if the circuit has a setting of inputs that causes it to output 1. The size of the formula is linear in the size of the circuit. The construction can be done in polynomial (actually, linear) time. So, if we had a polynomial-time algorithm to solve 3-SAT, then we could solve circuit-SAT in poly time too. ==================================================================== IMPORTANT FACT: now that we have 3-SAT, in order to prove some other NP problem Q is NP-complete, we just need to show 3-SAT <= Q: if we could solve Q then we could solve 3-SAT. MAKE SURE YOU UNDERSTAND THIS - A LOT OF PEOPLE MAKE THE MISTAKE OF DOING IT THE OTHER WAY AROUND. MAX-CLIQUE: given a graph G, find the largest clique (set of nodes s.t. all pairs in the set are neighbors). Decision problem: "Given G and integer k, is there a clique of size >= k?". (MAX-CLIQUE is clearly in NP.) Theorem: Max-Clique is NP-Complete. Proof: reduce 3-SAT to MAX-CLIQUE. Given a 3-CNF formula F of m clauses over n variables, we construct a graph as follows. For each clause c of F we create one node for every assignment to variables in c that satisfies c. E.g., say F = (x1 OR x2 OR not(x4)) AND (not(x3) OR x4) AND (not(x2) OR not(x3)) AND ... Then in this case we would create nodes like this: (x1=0,x2=0,x4=0) (x3=0,x4=0) (x2=0,x3=0) ... (x1=0,x2=1,x4=0) (x3=0,x4=1) (x2=0,x3=1) (x1=0,x2=1,x4=1) (x3=1,x4=1) (x2=1,x3=0) ... Then we put an edge between two nodes if the partial assignments are consistent. Note: max possible clique size is m. And, if the 3-SAT problem does have a satisfying assignment, then in fact there IS an m-clique. Claim is this is true in the other direction too. If the graph has an m-clique, then there is a satisfying assignment: namely, just read off the assignment given in the nodes of the clique. So, this graph has a clique of size m iff F was satisfiable. Also, our reduction is poly time since the total size of graph is at most quadratic in size of formula (O(m) nodes, O(m^2) edges). Therefore Max-Clique is NP-complete. Independent Set =============== An independent set in a graph is a set of nodes no two of which have an edge. E.g., in a 7-cycle, the largest independent set has size 3. (E.g, in the graph coloring problem, the set of nodes colored red is an independent set). Theorem: Independent set (is there an Indep Set in G of size > k?) is NP-complete. Proof: Reduce from clique. Given graph G for clique problem, just take complement of the graph . Ie. create graph H such that H has edge (u,v) iff G does NOT have edge (u,v). Then H has an indep set of size k iff G has a k-clique. Vertex-Cover ============ A vertex cover in a graph is a set of nodes such that every edge is incident to at least one. (e.g., look at cut-diamond graph). For instance, can think of as what rooms to put security guards in a museum. What we want is the smallest vertex cover. Decision problem: does there exist a vertex cover of size < k? Claim: if C is a vertex cover in a graph, then V-C is an independent set. Also if S is an independent set, then V-S is a vertex cover. So, to solve "is there an independent set of size > k?" just solve "is there a vertex cover of size < n-k?". So, Vertex cover is NP-Complete too.