CS 15-451        Analysis of Algorithms        9 November 2004
                        Manuel & Avrim Blum

Today: Definition and Theory of NP-completeness...  third lecture.


DEFINITION: A decision problem PI (given by INSTANCE and QUESTION) is
NP-COMPLETE iff
     1. PI is in NP.
     2. PI is complete for all problems in NP. This means that for every 
problem PI' in NP there is a poly-time function f from instances of
PI' to instances of PI that maps YES-instances of PI' to YES-instances
of PI and NO-instances of PI' to NO-instances of PI.


For example, suppose there exists a poly-time function f: 3SAT -> PI that maps
YES-instances to YES-instances, and NO-instances to NO-instances.  Existence
of f implies that any poly-time algorithm for PI can be turned into a poly-time
algorithm for 3SAT:  To solve an instance x of 3SAT, just map x into the
corresponding instance f(x) of PI. Then use the poly-time algorithm for PI
to get the answer for f(x).  That answer to f(x) is the desired answer to x.
(Why?)

Such a function f is said to REDUCE 3SAT to PI. This is because f reduces
the problem of solving 3SAT to the problem of solving PI.

Cook's Theorem:  3SAT is NP-COMPLETE.

In what follows, we assume the truth of Cook's Theorem, that
3SAT is NP-complete.  That makes it easier to prove that a problem
PI in NP is NP-complete.  To do so, we only have to show that
          1.  PI is in NP, and
          2.  3SAT reduces to PI.

Here is an example of how to do this (also due to Cook):

     CLIQUE PROBLEM:
     INSTANCE: a graph G = (V,E), and a positive integer k.
     QUESTION: does G have a clique of size at least k?
               A CLIQUE is a collection of vertices such that
               every pair of vertices is joined by an edge.

THEOREM: The CLIQUE problem is NP-complete.

PROOF:
1. CLIQUE is in NP.  (Why?)
2. 3SAT is poly-time reducible to CLIQUE:
An instance of 3SAT is defined by giving n variables and m clauses,
each clause containing 3 literals, each literal being either a variable or
its complement.
    The function f must map each such instance of 3SAT into an instance of
CLIQUE: a pair (G, k), where G = (V,E) is an undirected graph and
k is a positive integer.
    This function f must be answer-preserving: it must map YES-instances to
YES-instances and NO-instances to NO-instances.  In addition, f must run in
polynomial time. This latter condition on running time is what prevents the
function f from solving the instance of 3SAT. No poly-time algorithm is
known for 3SAT. (It is possible that none exists.) So f must map an instance
of 3SAT for which it does not know the answer to an instance of CLIQUE for
which it does not know the answer -- yet still be answer-preserving.
    An important idea for the construction of our particular f is to arrange
that the graph G in the image of f have all possible n(n-1)/2 edges
*except* for a few that we specifically leave out. A missing edge between
two vertices u,v ensures that at most one of these vertices can be in any
clique.
    We define f as follows: f maps the n 3SAT variables X1, ..., Xn  to n
pairs of *nonadjacent* vertices (x1, x1'), ..., (xn, xn'). f maps each
clause such as {X, Y, Z'} to a triple of *nonadjacent* vertices (x, y, z').
A vertex labelled x, say, and another labelled x', its complement, must be
*nonadjacent*.  All edges that have not been specifically forbidden are
inserted into the graph.
    This construction ensures that a clique can have no more than n+m
vertices, one from each of the n pairs (x1, x1'), ..., (xn, xn'). and one
from each of the m triples, such as  (x, y, z'). In fact, if the clauses
can be satisfied, then such a clique of size m+n can be constructed, and
not otherwise, so set k = m+n.
    To complete the proof, we must show 3 things:
      1. f maps YES-instances to YES-instances.
      2. f maps NO-instances to NO-instances.
      3. f is poly-time.
You should be able to do this yourself!
QED


Here is another example, this one due to Karp:

     VERTEX COVER PROBLEM:
     INSTANCE: an undirected graph G, and a positive integer k.
     QUESTION: Does G have a VC (a set of vertices "covering" all the edges)
               that has at most k vertices?

THEOREM: The VERTEX COVER problem is NP-complete.

We now have two ways to go about proving this theorem: we can either reduce
3SAT directly to VC, or we can reduce CLIQUE to VC: Suppose we construct a
poly-time function g that reduces CLIQUE to VC, just as the above poly-time
f reduced 3SAT to CLIQUE.  Then the composition of g and f, gf, will be a
poly-time reduction of 3SAT to VC:
      gf(3SAT instance) = g(CLIQUE instance) = VC instance


NP-completeness is about EXPRESSIVENESS. A reduction of 3SAT to PI is a
proof that PI is as expressive as 3SAT -- as capable as 3SAT at describing
computational problems. But how expressive is 3SAT? There is another
problem that is at least as expressive as 3SAT, and that is:

BOOLEAN FORMULA SATISFIABILITY
INSTANCE: a boolean formula (such as
     (   ((X -> Y) or (Y -> Z)) and X'Z'   )
QUESTION: is the boolean formula satisfiable? In other words, is there a
truth assignment to the variables of the boolean formula that make the
formula true?
In the above example, the assignment of T to each of X,Y,Z is not
satisfying. However, there are many satisfying assignments including any
assignment that maps X -> F.

BOOLEAN FORMULA SATISFIABILITY is clearly NP-Complete.
(Why? Because it is in NP (Why?), and because an instance of 3SAT maps
directly to an instance of BOOLEAN FORMULA SATISFIABILITY as in the
following example:
       {X, Y, Z'} {X', Y, Z}   ->   (X + Y + Z')(X' + Y + Z))