15-451 Algorithms 4/15/99 * Complexity classes: P, RP, BPP, NP * NP-completeness ============================================================================ In the course we've seen a collection of computational problems, some harder than others. Complexity theory gives us a way of talking about this in a formal sense. Start by describing some important complexity classes. These are defined with respect to decision (yes/no) problems. We'll get back in a second to the question of how these yes/no questions relate to the "search problem" -- the problem of finding the thing we want (the path or coloring or whatever). P: decision problems solvable in polynomial time. RP: Randomized one-sided error poly time. Problem is in RP if there exists randomized poly-time algorithm A such that: if x is a YES instance then Pr(A(x) = YES) >= 1/2 if x is a NO instance, then Pr(A(x) = YES) == 0 E.g., we showed the problem "Is x composite?" is in RP. co-RP: same as RP but with yes/no flipped. We showed the problem "Is x prime?" is in co-RP. BPP: 2-sided error version of RP. Problem is in BPP if there exists randomized poly-time algorithm A such that: if x is a YES instance then Pr(A(x) = YES) >= 3/4 if x is a NO instance, then Pr(A(x) = YES) <= 1/4 NP: NP is the class of problems where we might have absolutely no idea how to solve quickly, but if someone hands us a solution, we can verify it quickly. In particular, problem is in NP if there exists a polynomial time verification algorithm A such that: if x is a YES instance, then there exists y s.t. A(x,y) = YES. if x is a NO instance, then there does not exist y s.t. A(x,y)=YES where y must have length poly(|x|). All problems we've seen are in NP. --> e.g., does graph have a legal 3-coloring? x is the graph, y is the coloring. Verification alg A(x,y) is: for all edges (u,v) in the graph x, if (color(u) == color(v)) output NO output YES --> e.g., is n composite? x is number n, y is factorization into a,b Alg A checks that a,b > 1 and a*b==n. One way to think of it: problem is in NP if YES instances have a short proof of being YES. (But, NO instances might not have a short proof of being no. E.g., not clear how you could give a short proof that a graph is NOT 3-colorable). non-3-colorability is in co-NP. A problem is in co-NP if its compliment is in NP. NP is like: "I don't know how to find it, but I'll know it when I see it" --> Traveling Salesman problem (TSP): Given graph G, find shortest tour that visits all the vertices exactly once. (The shortest Hamiltonian tour) Here's the associated decision problem: Given graph G, integer k, is there a Hamiltonian tour of length < k? First: is this in NP? Yes. x=(graph, k), y = tour. Alg A checks that tour really visits all the vertices exactly once and that sum of distances traveled is in fact < k. Clearly if you can find the shortest tour, then you can solve this decision problem, but what about the other way around? part 1: say you have a black box to solve the decision problem, can you use to find the *length* of the shortest tour? Sure -- just do binary search. Say that length is L. part 2: how to use black box that tells if exists tour of length <=L to solve search problem? Say we remove an edge from graph and black box now says NO? (then, keep edge in graph). Say we remove the edge and the box still says YES? (then keep it out of the graph). What's left is the tour. What we've done here is called a reduction: reduced search problem to decision problem. Idea of solving instance of problem A by converting to instance of problem B and using known alg for B is called "reducing A to B". E.g., reduced multicommodity flow to LP, reduced maximum matching to max flow. We'll see this again in a crucial way when we talk about NP-completeness in a second. -->Any problem in P is also in NP. (Alg A just ignores y and solves x) --> Biggest open question in complexity theory is P=NP? Are they really different? If they are the same, that would mean that any problem with the property that you could *verify* a solution quickly also has a fast way of finding the solution. That would be weird. most people believe P != NP. But, it's very hard to prove that a fast algorithm for something does NOT exist. So, it's still an open problem. --> example of problem not believed to be in NP: For the n by n version of chess, does the first player have a winning strategy? (The answer is not known for the usual 8x8 chess). Might be true but even if you knew it was true, it's not clear how you could give a short proof. NP-Completeness =============== One of the amazing things about NP is that there are some problems, called NP-complete problems, with property that if they are in P, then all of NP is in P. Problem Q is NP-complete means: (1) it's in NP, and (2) any other problem Q' in NP is poly-time reducible to Q. (2) means that if you could solve Q in polynomial time then you could solve any problem in NP in polynomial time. So these are in a sense the hardest problems in NP. If problem satisfies (2) then it's called NP-hard. Common bugs in proofs: One of most insidious is saying "we're trying to figure out the worst-case cost. Obviously case X is the worst case. So, it suffices to analyze our cost on case X." Problem is that the "obviously" is rarely true. A lot of false proofs in research papers go this way. Amazing thing about NP-completeness is it's one of few settings where can do something along these lines. Say that Satisfiability (or graph coloring or whatever) is really the hardest problem in NP. --> circuit-SAT Given: a circuit of m 2-input gates defined over n 1-bit inputs x1,...,xn, that has no loops and a single output. (e.g, AND, OR, NOT gates) Question: is there a setting of the inputs that causes the circuit to output 1 (satisfies the circuit). Theorem: Circuit-SAT is NP-complete. First of all it's in NP since a proof is just a satisfying assignment, and we can check by just running time circuit in time O(m) which is poly in size of input. Now, to show the other part. Suppose we have some other problem Q' in NP. Since Q' is in NP, there is by definition a polynomial-time verification algorithm A that takes an instance x of Q' and a proof y, and outputs 1 if y is a legal proof that x is a YES instance. Now, the point is that we can write out a polynomial-size circuit that simulates the behavior of A. Its inputs are the bits in x and the bits in y, and the output is the value A(x,y). After all, that's what a computer is doing: if you code up A and run it, you are running it on clocked circuitry, and if you unroll the loops you can write it out as a circuit with a constant number of layers per clock cycle. Each layer represents the contents of the memory in the machine at that time. In polynomial time you can touch at most a polynomial number of memory entries, so the total width and depth of the ciruit is polynomial, so the size is polynomial in the number of bits of the input. E.g., size of circuit is at most O(|x|^t) for some constant t. Now what. So, suppose we had an polynomial-time algorithm for Circuit-SAT. Runs in time (size of circuit)^k for some constant k. Then we get a polynomial-time algorithm for Q' that runs in time O(|x|^{k*t}), which is polynomial. So, this is our first NP-complete problem. Now, we can use this to prove that a whole range of other very different-looking problems are also NP-complete. Today will just do one: 3-SAT. 3-SAT is a much simpler-looking special case of circuit-SAT, but we'll show that it is just as hard, in the sense that it too is NP-complete. --> 3-SAT: n variables. Formula is just an AND of ORs, where each OR has at most 3 literals in it (variables or negations). Question: is there an assignment to variables x1,...,xn that satisfies the formula? Theorem: 3-SAT is NP-complete. Proof: reduce circuit-SAT to 3-SAT. Given circuit, say all gates are OR or NOT. Convert to 3-CNF by giving new vars to internal wires of circuit. Use clauses to say that all internal gates are producing the correct outputs given their inputs. E.g., y= OR(x1,x2) is converted to (x1 OR x2 OR not(y)) AND (not(x1) OR y) AND (not(x2) OR y) y = not(x1) is converted to: (x1 or y) AND (not(x1) or not(y)) Also, if y is the output of the circuit, put in (y). The size of the formula is polynomial (actually, linear) in the size of the circuit. The construction can be done in polynomial time. So, the 3-CNF formula produced is satisfiable if and only if the original circuit wassatisfiable. So, 3-SAT is NP-complete since if we could solve 3-SAT in poly time, then we could solve circuit-SAT in poly time too. Next time: Prove Max-Clique, Knapsack are NP-complete. Also talk about co-NP.