15-451 Algorithms 10/21/99 Complexity Theory * NP and NP-completeness * Reductions HW 4 grading today and tomorrow ============================================================================ Preliminaries ============= Poly-time reduction and equivalence ----------------------------------- Problem A is poly-time REDUCIBLE to problem B if given a poly-time alg for B, we can use to produce a poly-time alg for A (A <=_p B) Problem A is poly-time EQUIVALENT to problem B (A =_p B) if A <=_p B and B <=_p A. Complexity Classes P and NP --------------------------- P: decision problems solvable in polynomial time. NP: decision problems solvable in Non-deterministic polynomial time. Two ways of thinking about NP : Version 1: A non-deterministic algorithm A is like a probabilistic algorithm, but we only require: if x is a YES instance, then Pr(A(x) = YES) > 0. if x is a NO instance, then Pr(A(x) = YES) = 0. Version 2: NP is the set of problems such that YES instances have short proofs that the answer is yes. "short proof" means that proof is of polynomial size and can be checked in polynomial time. These proofs are often called "witnesses". NP-Completeness =============== NP-complete problems have the property that if they are in P, then all of NP is in P. Problem Q is NP-complete means: (1) it's in NP, and (2) any other problem Q' in NP is poly-time reducible to Q. So these are in a sense the hardest problems in NP, since if you could solve Q in polynomial time, you could solve *any* problem in NP in polynomial time. If problem satisfies (2) then it's called NP-hard. An example of an NP-complete problem is Algorithm-SAT. Algorithm-SAT: Given an algorithm A, an input x, and a bound B written in unary (B bits). Question: does there exist an input y of length <= B such that A(x,y) halts in at most B steps and outputs 1? Theorem: Algorithm-SAT is NP-complete. Proof: Lets start with (1): Algorithm-SAT is in NP because a YES instance has a poly-size proof that can be checked in poly time. The proof is the string y, and to check it we run A for B steps. This is linear in the sizeof the input (since B is in unary). Or, if we want to think in terms of non-determinism, the nondeterministic poly-time algorithm is "guess a witness y and then check by running A for B steps" if we have a YES instance then there is *some* y that will work, but if we have a NO instance then there is *no* y that will work. Now for part (2): suppose we have some other problem Q' in NP. So, there is by definition a proof-checker A' that takes an input x of Q' and a proof y and in time an^b outputs 1 if y is a legal proof that x is YES (a and b are given constants). Furthermore, for any YES instance x, there *is* a legal proof y of size at most an^b. So, to tell whether x is a YES instance, we just feed A, x, and the bound a(size of x)^b written in unary to the supposed polynomial-time algorithm for Algorithm-SAT. Since the input to Algorithm-SAT has size basically a(size of x)^b, and our poly-time algorithm runs in time c(size of its input)^d for some constants c and d, the combined thing solves Q' in time O((size of x)^{cd}) which is polynomial. So, this is our first NP-complete problem. It's as least as hard as any problem in NP since in a sense any other problem is a special case of it. But, now we can turn this into some less unweildy problems. Circuit-SAT and 3-SAT ===================== Circuit-SAT: Given a circuit of OR and NOT gates with multiple inputs and a single output (some of the inputs may be hardwired). Question: is there a setting of the inputs that causes the circuit to output 1? (Circuit doesn't have loops) 3-SAT: Given a formula that's an AND of ORs like (x1 OR x2 OR not(x3)) AND (not(x2) OR x3) AND (x1 OR x3) AND ... where each clause has at most 3 literals in it. Question: is there an assignment to the variables that satisfies the formula? These are both NP-complete. Let me give a hand-wavy proof of the first one, and then a more careful proof of the second using the first one. Circuit-SAT is NP-Complete. First of all, it's in NP since we can just guess the inputs and run the circuit in time linear in its size. It's NP-complete because you can reduce Algorithm-SAT to it. You can convert the algorithm A and bound B in Algorithm-SAT into a circuit of poly(B) gates that on input x and y does the same thing A does in B steps. This is probably easiest to see if you've built a computer before sinc e a computer is just a bunch of clocked circuitry: each level ofthe circuit represents the contents of A's memory at th next clock cycle, and in between two clock cycles, there are only a finite number of things that were effected. Or, if you are familiar with the Turing machine model, each level of the ciruit can represent the contents of the tape and the head position, and each step can be modeled by a layer with a polynomial number of gates. We then hardwire the x inputs and ask if there is a setting to the y inputs that makes the thing output a 1. Now, let's do 3-SAT, since it's more interesting because 3-SAT looks so simple. Theorem: 3-SAT is NP-complete. Proof: reduce circuit-SAT to 3-SAT. Given circuit, say all gates are OR or NOT. Convert to 3-CNF by giving new vars to internal wires of circuit. Use clauses to say that all internal gates are producing the correct outputs given their inputs. E.g., y= OR(x1,x2) is converted to (x1 OR x2 OR not(y)) AND (not(x1) OR y) AND (not(x2) OR y) y = not(x1) is converted to: (x1 or y) AND (not(x1) or not(y)) Also, if y is the output of the circuit, put in (y). The size of the formula is polynomial (actually, linear) in the size of the circuit. The construction can be done in polynomial time. So, the 3-CNF formula produced is satisfiable if and only if the original circuit wassatisfiable. So, 3-SAT is NP-complete since if we could solve 3-SAT in poly time, then we could solve circuit-SAT in poly time too. More generally, to show Q is NP-complete, what we're doing is reducing some other NP-complete problem Q' to it. mapping Q' -----> Q instance x' of Q' -----> instance x of Q (poly time) x' is YES instance of Q' iff x is YES instance of Q. Now, can use 3-SAT to prove a huge number of different problems are also NP-complete. Do: Clique, Independent Set. Theorem: Max-Clique (Does G have a clique of size > k?) is NP-hard. Proof: reduce from 3-SAT. Given a 3-CNF formula F of m clauses over n variables, we construct a graph as follows. For each clause c of F we create one node for every assignment to variables in c that satisfies c. E.g., say F = (x1 OR x2 OR not(x4)) AND (not(x3) OR x4) AND (not(x2) OR not(x3)) AND ... Then we create up to 7m nodes: for each clause of F of size 3 there are 7 assignments to variables in it that satisfy the clause. (If the clause has size 2 then there are 3 such assignments.) E.g., in this case the nodes are: (x1=0,x2=0,x4=0) (x3=0,x4=0) (x2=0,x3=0) ... (x1=0,x2=1,x4=0) (x3=0,x4=1) (x2=0,x3=1) (x1=0,x2=1,x4=1) (x3=1,x4=1) (x2=1,x3=0) ... Then we put an edge between two nodes if the partial assignments are consistent. Note: max possible clique size is m. And, if the 3-SaT problem does have a satisfying assignment, then in fact there IS an m-clique. Claim is this is true in the other direction too. If the graph has an m-clique, then there is a satisfying assignment: namely, just read off the assignment given in the nodes of the clique. So, Max-Clique is NP-complete too. Independent Set =============== An independent set in a graph is a set of nodes no two of which have an edge. E.g., in a 7-cycle, the largest independent set has size 3. (E.g, in the graph coloring problem, the set of nodes colored red is an independent set). Theorem: Independent set (is there an Indep Set in G of size > k?) is NP-complete. Proof: Reduce from clique. Given graph G for clique problem, just take complement of the graph . Ie. create graph H such that H has edge (u,v) iff G does NOT have edge (u,v). Then H has an indep set of size k iff G has a k-clique. ------------------------------------------------------------------- Another example which can be shown to be NP-complete, which we may not cover in class is : Vertex-Cover ============ A vertex cover in a graph is a set of nodes such that every edge is incident to at least one. (e.g., look at cut-diamond graph). For instance, can think of as what rooms to put security guards in a museum. What we want is the smallest vertex cover. Decision problem: does there exist a vertex cover of size < k? Claim: if C is a vertex cover in a graph, then V-C is an independent set. Also if S is an independent set, then V-S is a vertex cover. So, to solve "is there an independent set of size > k?" just solve "is there a vertex cover of size < n-k?". So, Vertex cover is NP-Complete too.