CS 15-451 Analysis of Algorithms 4 November 2004
Manuel & Avrim Blum
Today: Definition and Theory of NP-completeness... continued.
The theory of NP-completeness is concerned with decision problems.
A decision problem is a computational problem defined by a pair (INSTANCE,
QUESTION). Here, INSTANCE describes what constitutes an acceptable instance
of the problem. QUESTION is a question about the instance. An INSTANCE is a
YES-instance if the correct answer to the QUESTION is YES. Otherwise
(the correct answer is NO), it is a NO-instance.
Here are the formal definitions of P, NP, and CoNP:
* P is the class of decision problems solvable by efficient (polynomial time
computable) algorithms.
* NP is a class of decision problems: A decision problem PI is in NP iff
there exists a polynomial "poly" and a 0-1 valued poly-time computable
function ff(x,y) such that:
1. for every YES-instance x of length n, there exists a string y of length
at most poly(n) for which ff(x,y) = 1, while
2. for every NO-instance x, for any and all strings y (of any length),
ff(x,y) = 0.
(Note the asymmetry between 1. and 2.)
* Co-NP is a class of decision problems: A decision problem PI is in Co-NP
iffthere exists a polynomial "poly" and a 0-1 valued poly-time computable
function ff(x,y) such that:
1. for every NO-instance x of length n, there exists a string y of length
at most poly(n) for which ff(x,y) = 1, while
2. for every YES-instance x, for any and all strings y (of any length),
ff(x,y) = 0.
The function ff defines a proof system. In the NP proof system, y is the
so-called proof that x is a YES-instance. If ff(x,y) = 1, then y is a valid
(correct) proof that x is a YES-instance. If ff(x,y) = 0, then y is an invalid
proof that x is a YES-instance. In this case, x may or may not be a
YES-instance; if it is, y is not a valid proof.
A Logician might define NP in an equivalent way as follows:
* NP is a class of decision problems. A decision problem PI is in NP iff
there exists a polynomial "poly" and a proof system -- a system of logic
that includes a poly time algorithmic checker for checking proofs --
such that:
1. for every YES-instance of length n, there exists a poly(n) long
string proving that the answer is YES; while
2. for every NO-instance, no string (of any length) proves that the
answer is YES.
Informally, a decision problem is in NP iff it has a polynomial length
Proof when the answer is YES. It is in CoNP iff it has a polynomial
length Proof when the answer is NO. NP stands for "Nondeterministic
Polynomial." (It could just as well stand for "Nifty Proof" where "Nifty"
means "Polynomial Length.")
The following theorem asserts that P is contained in both NP and CoNP:
Theorem: Let PI be any decision problem in P. Then PI is in both NP and CoNP.
Proof: A poly-time algorithm may be viewed as a system for generating
polynomial length proofs of YES and NO. QED
There exist problems like INTEGER FACTORIZATION which are in both
NP and CoNP but which are not known to be in P.
Here are some examples of computational problems. See if you can tell which
are in NP, which are in CoNP, and which in P:
*VERTEX COVER (VC) SEARCH/OPTIMIZATION PROBLEM:
INPUT: an undirected weighted graph G.
OUTPUT: a collection C of vertices that covers all the edges.
This means that every edge has at least one endpoint
in the cover.
(Not a decision problem. Therefore the question whether or
not this problem is in NP makes no sense.)
*VC DECISION PROBLEM:
INSTANCE: an undirected graph G, and
a positive integer k with k < n.
QUESTION: Does G have a VC having at most k vertices?
(NP)
*A NONSTANDARD VC PROBLEM:
INSTANCE: an undirected graph G,
a positive integer k < n, and
a collection C of vertices of G.
QUESTION: Is C a vertex cover having at most k vertices?
(Not only is this problem in NP, it is in fact in P.)
*ANOTHER NONSTANDARD VC DECISION PROBLEM:
INSTANCE: an undirected graph G,
a vertex cover C of G
QUESTION: Is C a VC of minimum size?
(Co-NP: there is a polynomial-length proof when the answer is NO.)
*ANOTHER NONSTANDARD VC DECISION PROBLEM:
INSTANCE: an undirected graph G,
a vertex cover C of G
QUESTION: Does G have a vertex cover that is smaller than C? (In NP)
We now switch to the SAT problem. This problem is historically important
on account of its role in the theory of NP-completeness. It has become
practicallyimportant as well:
PROBLEM: 3SAT
INSTANCE: a set of m clauses on n variables. Each clause contains 3
literals. Each literal is either a variable or the complement of a variable.
QUESTION: is there an assignment of truth values to the variables such
that each clause contains a true literal?
EXAMPLE: {X, Y, Z'} {X, Z, W'} {X', Y, W} {Y', Z, W'}
(This is satisfiable under the assignment that maps every
variable to T.)
EXAMPLE of a 2SAT problem:
{X', Y} {Y', Z} {Z', X} {X, Y} {X', Z'}
(This is NOT satisfiable. Why not?)
DEFINITION: A decision problem PI (given by INSTANCE and QUESTION) is
NP-COMPLETE iff
1. PI is in NP.
2. PI is complete for all problems in NP. This means that for every problem
PI' in NP there is a poly-time function f from instances of PI' to
instances of PI
that maps YES-instances of PI' to YES-instances of PI and NO-instances of PI'
to NO-instances of PI.
For example, suppose PI' = 3SAT.
The existence of (such) a poly-time function f: 3SAT -> PI that maps
YES-instances to YES-instances, and NO-instances to NO-instances
implies that any poly-time algorithm for PI can be turned into a poly- time
algorithm for 3SAT: To solve an instance x of 3SAT, just map x into the
corresponding instance f(x) of PI. Then use the poly-time algorithm for PI
to get the answer for f(x). That answer to f(x) is the desired answer to x.
(Why?)
Such a function f is said to REDUCE 3SAT to PI. This is because f reduces
the problem of solving 3SAT to the problem of solving PI.
Cook's Theorem: 3SAT is NP-COMPLETE.
This is a difficult (but not impossibly difficult) fundamental theorem.
You will see the proof of it in both graduate algorithms 15-750 and
graduate complexity 15-855 courses.
For now, let's just notice something strange about this theorem.
It says that EVERY problem in NP is poly-time reducible to 3SAT.
Every problem in NP? How can one show this for every single
problem in NP? One can see how to show that a particular problem
(like graph 3-colorability) reduces to 3SAT (Show this!), but how in the world
does one show that EVERY problem in NP reduces to 3SAT?
How does one even get a hold of "every problem in NP"?
The key idea is that every problem in NP is completely defined by
its corresponding poly-time computable 0-1 valued function ff(x,y).
In what follows, we assume the truth of Cook's Theorem, that
3SAT is NP-complete. That makes it easier to prove that a problem
PI in NP is NP-complete. To do so, we only have to show that
1. PI is in NP, and
2. 3SAT reduces to PI.
Here is an example (also due to Cook):
CLIQUE PROBLEM:
INSTANCE: a graph G = (V,E), and a positive integer k.
QUESTION: does G have a clique of size at least k?
A CLIQUE is a collection of vertices such that
every pair of vertices is joined by an edge.
THEOREM: The CLIQUE problem is NP-complete.
PROOF:
1. CLIQUE is in NP. (Why?)
2. 3SAT is poly-time reducible to CLIQUE:
An instance of 3SAT is defined by giving n variables and m clauses,
each clause containing 3 literals, each literal being either a variable or
its complement.
The function f must map each such instance of 3SAT into an instance of
CLIQUE: a pair (G, k), where G = (V,E) is an undirected graph and
k is a positive integer.
This function f must be answer-preserving: it must map YES-instances to
YES-instances and NO-instances to NO-instances. In addition, f must run in
polynomial time. This latter condition on running time is what prevents the
function f from solving the instance of 3SAT. No poly-time algorithm is
known for 3SAT. (It is likely that none exists.) So f must map an instance
of 3SAT for which it does not know the answer to an instance of CLIQUE for
which it does not know the answer -- and still be answer-preserving.
An important idea for the construction of our particular f is to arrange
that the graph G in the image of f have all possible n(n-1)/2 edges
*except* for a few that we specifically leave out. A missing edge between
two vertices u,v ensures that at most one of these vertices can be in any
clique.
We define f as follows: f maps the n 3SAT variables X1, ..., Xn to n
pairs of *nonadjacent* vertices (x1, x1'), ..., (xn, xn'). f maps each
clause, {X, Y, Z'} for example, to a triple of *nonadjacent* vertices (x,
y, z'). All edges that have not been specifically forbidden are inserted
into the graph.
This construction ensures that a maximum size clique can have at most n+m
vertices, one from each of the n pairs (x1, x1'), ..., (xn, xn'). and one
from each of the m triples, such as (x, y, z'). In fact, if the clauses
can be satisfied, then such a clique of size m+n can be constructed, and
not otherwise, so set k = m+n.
To complete the proof, we must show 3 things:
1. f maps YES-instances to YES-instances.
2. f maps NO-instances to NO-instances.
3. f is poly-time.
You should be able to do this yourself!
QED
Here is another example, this one due to Karp:
VERTEX COVER PROBLEM:
INSTANCE: an undirected graph G, and a positive integer k.
QUESTION: Does G have a VC having at most k vertices?
THEOREM: The VERTEX COVER problem is NP-complete.
We now have two ways to go about proving this theorem: we can either reduce
3SAT directly to VC, or we can reduce CLIQUE to VC: Suppose we construct a
g that reduces CLIQUE to VC, just as the above f reduced 3SAT to CLIQUE.
Then the composition of g and f, gf, will be a reduction of 3SAT to VC:
gf(3SAT instance) = g(CLIQUE instance) = VC instance
NP-completeness is about EXPRESSIVENESS. A reduction of 3SAT to PI is a
proof that PI is as expressive as 3SAT -- as capable as 3SAT in describing
computational problems. But how expressive is 3SAT? There is another
problem that is at least as expressive as 3SAT, and that is:
BOOLEAN FORMULA SATISFIABILITY
INSTANCE: a boolean formula (such as
( ((X -> Y) or (Y -> Z)) and X'Z' )
QUESTION: is the boolean formula satisfiable? In other words, is there a
truth assignment to the variables of the boolean formula that make the
formula true?
In the above example, the assignment of T to each of X,Y,Z is not
satisfying. However, there are many satisfying assignments including any
assignment that maps X -> F.
BOOLEAN FORMULA SATISFIABILITY is clearly NP-Complete.
(Why? Because it is in NP (Why?), and because an instance of 3SAT maps
directly to an instance of BOOLEAN FORMULA SATISFIABILITY as in the
following example:
{X, Y, Z'} {X', Y, Z} -> (X + Y + Z')(X' + Y + Z))