CS 15-451      Analysis of Algorithms      6 November 2003
                                             Manuel & Avrim Blum

Today: definition and theory of NP-completeness...  continued.

The theory of NP-completeness is concerned with decision problems. A 
decision problem is a computational problem defined by a pair (INSTANCE, 
QUESTION). Here, INSTANCE describes what constitutes an acceptable instance 
of the problem. QUESTION is a question about the instance. An INSTANCE is a 
YES-Instance if the correct answer to the QUESTION is YES. Otherwise it is 
a NO-Instance.

Recall the formal definitions of P and NP:
* P is the class of decision problems solvable by polynomial time algorithm.
* NP is a class of decision problems.  A decision problem PI is in NP iff 
there exists a polynomial "poly" and a proof system -- a system of logic 
that includes a poly time algorithmic checker for checking proofs --
such that:
    1. for every YES-Instance of length n, there exists a poly(n) long 
string proving that the answer is YES; while
    2. for every NO-Instance, no string (of any length) proves that the 
answer is YES.

Informally, a decision problem is in NP iff it has a Nifty (polynomial 
length) Proof when the answer is YES.  It is in CoNP iff it has a Nifty 
Proof when the answer is NO.
P is contained in both NP and CoNP.
There exist problems like INTEGER FACTORIZATION which are in both NP and 
CoNP but which are not known to be in P.


Theorem: Let PI be any decision problem in P. Then PI is in both NP and CoNP.

Proof: A poly-time algorithm may be viewed as a system for generating 
polynomial length proofs of YES and NO.  QED


Here are some examples of computational problems. See if you can tell which 
are in NP, which are in CoNP, and which in P:

  *VERTEX COVER (VC) SEARCH/OPTIMIZATION PROBLEM:
   INPUT: an undirected weighted graph G.
   OUTPUT: a collection C of vertices that covers all the edges.
           This means that every edge has at least one endpoint
           in the cover.
   (Not a decision problem. Therefore the question whether or
    not this problem is in NP makes no sense.)

  *VC DECISION PROBLEM:
   INSTANCE: an undirected graph G, and
             a positive integer k, k < n.
   QUESTION: Does G have a VC having at most k vertices?
   (NP)

  *A NONSTANDARD VC PROBLEM:
   INSTANCE: an undirected graph G,
             a positive integer k < n, and
             a collection C of vertices of G.
   QUESTION: Is C a vertex cover having at most k vertices?
   (Not only in NP. In fact in P)

  *ANOTHER NONSTANDARD VC DECISION PROBLEM:
   INSTANCE: an undirected graph G,
             a vertex cover C of G
   QUESTION: Is C a VC of minimum size?
   (Co-NP: there is a nifty proof when the answer is NO.)

  *ANOTHER NONSTANDARD VC DECISION PROBLEM:
   INSTANCE: an undirected graph G,
             a vertex cover C of G
   QUESTION: Does G have a vertex cover that is smaller than C?   (In NP)


Then give more examples.
  * a proof system for 3-colorability is an assignment of R,W,B,
    to each vertex with the property that
      1. for every 3-colorable graph, there is an assignment of one of the 
three colors to each vertex such that no two adjacent vertices (vertices 
joined by an edge) get the same color; while
      2. if the graph is NOT 3-colorable, then there is no such coloring. 
ie for every assignment of R, W, B to the vertices, some pair of adjacent 
vertices will be colored the same.
  * a proof system for Hamilton Cycle

Define the 3SAT problem:
   INSTANCE: a set of m clauses on n variables. Each clause contains 3 
literals. Each literal is either a variable or the complement of a variable.
   QUESTION: is there an assignment of truth values to the variables such 
that each clause contains a true literal?

EXAMPLE: {X, Y, Z'} {X, Z, W'} {X', Y, W} {Y', Z, W'}
  (This is satisfiable under the assignment that maps every
   variable to T.)
EXAMPLE of a 2SAT problem:
          {X', Y} {Y', Z} {Z', X} {X, Y} {X', Z'}
  (This is NOT satisfiable. Why not?)


DEFINITION: A decision problem PI (given by INSTANCE and QUESTION) is 
NP-COMPLETE iff
   1. PI is in NP.
   2. PI is complete for all problems in NP. This means that there is a 
poly-time function f from instances of 3SAT to instances of PI that maps 
YES-instances of 3SAT to YES-instances of PI and NO-instances of 3SAT to 
NO-instances of PI.

The existence of such a poly-time function f: 3SAT -> PI implies that any 
poly-time algorithm for PI can be turned into a poly- time algorithm for 
3SAT: To solve an instance I of 3SAT, just map I into the corresponding 
instance f(I) of PI. Then use the poly-time algorithm for PI to get the 
answer for f(I). That answer to f(I) is the desired answer to I.

Such a function f is said to REDUCE 3SAT to PI. This is because the problem 
of solving 3SAT has been reduced to the problem of solving PI.

THEOREM: The CLIQUE problem is NP-complete.

   CLIQUE PROBLEM:
   INSTANCE: a graph G = (V,E), and a positive integer k.
   QUESTION: does G have a clique of size at least k?
             A CLIQUE is a collection of vertices such that
             every pair of vertices is joined by an edge.

PROOF: An Instance of 3SAT is defined by giving n variables and m clauses, 
each clause containing 3 literals, each literal being either a variable or 
its complement.
  The function f must map each such Instance of 3SAT into an instance of 
CLIQUE, that is, a pair (G, k), where G = (V,E) is an undirected graph and 
k is a positive integer.
  This function f must be answer-preserving: it must map YES-Instances to 
YES-Instances and NO-Instances to NO-Instances.  In addition, f must run in 
polynomial time. This latter condition on running time is what prevents the 
function f from solving the Instance of 3SAT. No poly-time algorithm is 
known for 3SAT. (It is likely that none exists.) So f must map an instance 
of 3SAT for which it does not know the answer to an instance of CLIQUE for 
which it does not know the answer -- and still be answer-preserving.
  An important idea for the construction of our particular f is to arrange 
that the graph G in the image of f have all possible n(n-1)/2 edges 
*except* for a few that we specifically leave out. A missing edge between 
two vertices u,v ensures that at most one of these vertices can be in any 
clique.
  We define f as follows: f maps the n 3SAT variables X1, ..., Xn  to n 
pairs of *nonadjacent* vertices (x1, x1'), ..., (xn, xn'). f maps each 
clause, {X, Y, Z'} for example, to a triple of *nonadjacent* vertices (x, 
y, z'). All edges that have not been specifically forbidden are inserted 
into the graph.
  This construction ensures that a maximum size clique can have at most n+m 
vertices, one from each of the n pairs (x1, x1'), ..., (xn, xn'). and one 
from each of the m triples, such as  (x, y, z'). In fact, if the clauses 
can be satisfied, then such a clique of size m+n can be constructed, and 
not otherwise, so set k = m+n.
  To complete the proof, we must show 3 things:
    1. f maps YES-Instances to YES-Instances.
    2. f maps NO-Instances to NO-Instances.
    3. f is poly-time.
You should be able to do this yourself!
QED


THEOREM: The VERTEX COVER problem is NP-complete.

   VERTEX COVER PROBLEM:
   INSTANCE: an undirected graph G, and a positive integer k.
   QUESTION: Does G have a VC having at most k vertices?

We now have two ways to go about proving this theorem: we can either reduce 
3SAT directly to VC, or we can reduce CLIQUE to VC: Suppose we construct a 
g that reduces CLIQUE to VC, just as the above f reduced 3SAT to CLIQUE. 
Then the function gf will be a reduction of 3SAT to VC:
    gf(3SAT Instance) = g(CLIQUE Instance) = VC Instance


NP-completeness is about EXPRESSIVENESS. A reduction of 3SAT to PI is a 
proof that PI is as expressive as 3SAT -- as capable as 3SAT in describing 
computational problems. But how expressive is 3SAT? There is another 
problem that is at least as expressive as 3SAT, and that is:

BOOLEAN FORMULA SATISFIABILITY
INSTANCE: a boolean formula (such as
   (   ((X -> Y) or (Y -> Z)) and X'Z'   )
QUESTION: is the boolean formula satisfiable? In other words, is there a 
truth assignment to the variables of the boolean formula that make the 
formula true?
In the above example, the assignment of T to each of X,Y,Z is not 
satisfying. However, there are many satisfying assignments including any 
assignment that maps X -> F.

BOOLEAN FORMULA SATISFIABILITY is clearly NP-Complete.
(Why? Because it is in NP (Why?), and because an instance of 3SAT maps 
directly to an instance of BOOLEAN FORMULA SATISFIABILITY as in the 
following example:
     {X, Y, Z'} {X', Y, Z}   ->   (X + Y + Z')(X' + Y + Z))