CS 15-451 Analysis of Algorithms 2 November 2004 Manuel & Avrim Blum Today: the definition and theory of NP-completeness A COMPUTATIONAL SEARCH or OPTIMIZATION PROBLEM is given by describing what constitutes an allowable INPUT, and for each INPUT what constitutes an allowable (correct) OUTPUT. EXAMPLES: FACTORIZATION INPUT = a positive integer N OUTPUT = a factorization of N into primes. More formally, [(p1,e1),...,(pk,ek)] e1 ek such that N = p1 * ... * pk GRAPH VERTEX COLORING INPUT = an undirected graph represented by an adjacency matrix. OUTPUT = an optimal coloring of the vertices of the graph. A COLORING is an assignment of colors to the vertices of the graph so that no two adjacent vertices (vertices joined by an edge) have the same color. A coloring is OPTIMAL if it uses the minimum possible number of colors. Most practical algorithm problems are computational search or optimization problems, like the ones above. The theory, however, is developed in terms of decision problems, each given by an INSTANCE and a QUESTION: FACTORING DECISION PROBLEM INSTANCE = 2 positive integers N, k, with k < N. QUESTION = does N have a prime factor < k? GRAPH VERTEX COLORING INSTANCE = an undirected graph represented by an adjacency matrix; and a positive integer k. QUESTION = is there a proper coloring of the vertices that uses at most k distinct colors? (Proper means that no two adjacent vertices are assigned the same color.) An algorithm is poly-time iff there is a specific polynomial, like poly(n) = n^3, such that for every positive integer n, and for every input of length n, a correct output is produced in poly(n) amount of time. Clearly, a poly-time algorithm for either of the above computational problems could be used to give a poly-time algorithm for the corresponding decision problem. Less obvious, but still true, is that it goes the other way around: a poly-time algorithm for either of the above decision problems would ensure the existence of a poly-time algorithm for the corresponding computational problem. SHOW THIS! Decision problems are defined, whenever possible, to ensure (as above) that there exists a poly-time algorithm for the decision problem iff there exists a poly-time algorithm for the corresponding computational search problem. Informally, P = class of all decision problems that can be solved in a fixed polynomial (of the input length) time. More formally, Def: P = class of all decision problems for which there exists a polynomial "poly" such that for every allowable input, the problem can be solved in poly(input length) time. Def: NP = Nondeterministic Polynomial-time decision problems. Some people think NP means "Not Polynomial." While insightful (in a way), this is not correct. In fact, NP contains P. I prefer NP = Nifty Proof. (Nifty = Poly Length). NP is the class of all decision problems that have a (nifty) polynomial (of the input length) proof when the answer is YES. What does it mean for a decision problem to be in NP, ie. to have POLYNOMIAL LENGTH PROOFS? 1st give examples. *A proof that a graph on n vertices is 3-colorable. (NP (In fact, NP-complete)) *A proof that a graph on n vertices is NOT 3-colorable. (Most likely NOT in NP) *All our problems will be decision (YES-NO) problems *A proof that a graph is NOT 2-colorable. (Yes, this is in NP. In fact, it is in P: there is a poly-time algorithm to decide this! What is it?) *TRAVELLING SALESMAN PROBLEM (TSP): INPUT: a complete undirected weighted graph G. Weights on edges are positive integers. OUTPUT: a tour of (all) the vertices that is of minimum length, i.e., no longer than any other tour. (Not a decision problem. Therefore the question whether or not this problem is in NP makes no sense.) *TSP DECISION PROBLEM: INSTANCE: a complete undirected weighted graph G (all weights positive integers), and a positive integer k < n. QUESTION: Does G have a tour of length < k? (NP) *A NONSTANDARD TSP DECISION PROBLEM: INSTANCE: a complete undirected weighted graph G, a positive integer k < n, and a tour t of G QUESTION: Is the given tour t of length < k? (NP. In fact P) *ANOTHER NONSTANDARD TSP DECISION PROBLEM: INSTANCE: a complete undirected weighted graph G, a tour t of G QUESTION: Is t a tour of minimum length? (Co-NP: there is a nifty proof when the answer is NO.) *ANOTHER NONSTANDARD TSP DECISION PROBLEM: INSTANCE: a complete undirected weighted graph G, a tour t of G QUESTION: Does G have a tour t' of shorter length than t, ie. length(t') < length(t)? (In NP) Then give a formal definition of NP. *Definition: a PROOF SYSTEM for a computational decision problem is a system of logic such that: 1. if the answer to the given instance is YES, then there exists a string that proves that the answer is YES; while 2. if the answer is NO, then no string proves that the answer is YES. *Definition: NP is the class of decision problems for (each of) which there exists a proof system -- a system of logic -- together with a polynomial "poly" such that: 1. if the answer to the given instance of length n is YES, then there exists a poly(n) long string that proves that the answer is YES; while 2. if the answer is NO, then no string (of any length) proves that the answer is YES. Then give more examples. * a proof system for 3-colorability is an assignment of R,W,B, to each vertex with the property that 1. for every 3-colorable graph, there is an assignment of one of the three colors to each vertex such that no two adjacent vertices (vertices joined by an edge) get the same color; while 2. if the graph is NOT 3-colorable, then there is no such coloring. ie for every assignment of R, W, B to the vertices, some pair of adjacent vertices will be colored the same. * a proof system for Hamilton Cycle (HC) is a permutation of the n vertices vi1, vi2, ..., vin such that 1.if the graph G has a HC, then at least one such permutation is a cycle. 2.if G has no HC, then no such permutation is a cycle. Define the 3SAT problem: INSTANCE: a set of m clauses on n variables. Each clause contains 3 literals. Each literal is either a variable or the complement of a variable. QUESTION: is there an assignment of truth values to the variables such that each clause contains a true literal? EXAMPLE: {X, Y, Z'} {X, Z, W'} {X', Y, W} {Y', Z, W'} (This is satisfiable under the assignment that maps every variable to T. Why?) EXAMPLE of a 2SAT problem: {X', Y} {Y', Z} {Z', X} {X, Y} {X', Z'} (This is NOT satisfiable. Why not?) Def: A decision problem PI (given by INSTANCE and QUESTION) is NP-complete iff 1. PI is in NP. 2. PI is complete for all problems in NP. This means that there is a poly-time function f from instances of 3SAT to instances of PI that maps YES-instances of 3SAT to YES-instances of PI and NO-instances of 3SAT to NO-instances of PI. A YES-instance of a problem is an instance for which the correct answer is YES. Similarly, a NO-instance is an instance for which the correct answer is NO. The existence of a poly-time function f: 3SAT -> PI implies that a poly-time algorithm for PI can be turned into a poly- time algorithm for 3SAT -- and therefore (still to be shown) into a poly-time algorithm for any problem in NP. EXAMPLE: The CLIQUE problem is NP-complete. CLIQUE PROBLEM: INSTANCE: a graph G = (V,E), and a positive integer k. QUESTION: does G have a clique of size at least k? A CLIQUE is a collection of vertices such that every pair of vertices is joined by an edge. NP-completeness is about EXPRESSIVENESS.