CS 15-451 Analysis of Algorithms 2 November 2004
Manuel & Avrim Blum
Today: the definition and theory of NP-completeness
A COMPUTATIONAL SEARCH or OPTIMIZATION PROBLEM is given by describing what
constitutes an allowable INPUT, and for each INPUT what constitutes an
allowable (correct) OUTPUT.
EXAMPLES:
FACTORIZATION
INPUT = a positive integer N
OUTPUT = a factorization of N into primes.
More formally, [(p1,e1),...,(pk,ek)]
e1 ek
such that N = p1 * ... * pk
GRAPH VERTEX COLORING
INPUT = an undirected graph represented by an adjacency
matrix.
OUTPUT = an optimal coloring of the vertices of the graph.
A COLORING is an assignment of colors to the vertices
of the graph so that no two adjacent vertices
(vertices joined by an edge) have the same color.
A coloring is OPTIMAL if it uses the minimum
possible number of colors.
Most practical algorithm problems are computational search or optimization
problems, like the ones above. The theory, however, is developed in terms
of decision problems, each given by an INSTANCE and a QUESTION:
FACTORING DECISION PROBLEM
INSTANCE = 2 positive integers N, k, with k < N.
QUESTION = does N have a prime factor < k?
GRAPH VERTEX COLORING
INSTANCE = an undirected graph represented by an adjacency
matrix; and a positive integer k.
QUESTION = is there a proper coloring of the vertices that
uses at most k distinct colors?
(Proper means that no two adjacent vertices are
assigned the same color.)
An algorithm is poly-time iff there is a specific polynomial, like poly(n)
= n^3, such that for every positive integer n, and for every input of
length n, a correct output is produced in poly(n) amount of time.
Clearly, a poly-time algorithm for either of the above computational
problems could be used to give a poly-time algorithm for the corresponding
decision problem. Less obvious, but still true, is that it goes the other
way around: a poly-time algorithm for either of the above decision problems
would ensure the existence of a poly-time algorithm for the corresponding
computational problem. SHOW THIS!
Decision problems are defined, whenever possible, to ensure (as above) that
there exists a poly-time algorithm for the decision problem iff there exists
a poly-time algorithm for the corresponding computational search problem.
Informally, P = class of all decision problems that can be solved in
a fixed polynomial (of the input length) time.
More formally,
Def: P = class of all decision problems for which there exists a
polynomial "poly" such that for every allowable input, the problem
can be solved in poly(input length) time.
Def: NP = Nondeterministic Polynomial-time decision problems.
Some people think NP means "Not Polynomial." While insightful
(in a way), this is not correct. In fact, NP contains P.
I prefer NP = Nifty Proof. (Nifty = Poly Length).
NP is the class of all decision problems that have a (nifty)
polynomial (of the input length) proof when the answer is YES.
What does it mean for a decision problem to be in NP, ie. to have
POLYNOMIAL LENGTH PROOFS?
1st give examples.
*A proof that a graph on n vertices is 3-colorable.
(NP (In fact, NP-complete))
*A proof that a graph on n vertices is NOT 3-colorable.
(Most likely NOT in NP)
*All our problems will be decision (YES-NO) problems
*A proof that a graph is NOT 2-colorable.
(Yes, this is in NP. In fact, it is in P: there is a poly-time
algorithm to decide this! What is it?)
*TRAVELLING SALESMAN PROBLEM (TSP):
INPUT: a complete undirected weighted graph G.
Weights on edges are positive integers.
OUTPUT: a tour of (all) the vertices that is of minimum length,
i.e., no longer than any other tour.
(Not a decision problem. Therefore the question whether or
not this problem is in NP makes no sense.)
*TSP DECISION PROBLEM:
INSTANCE: a complete undirected weighted graph G
(all weights positive integers), and
a positive integer k < n.
QUESTION: Does G have a tour of length < k?
(NP)
*A NONSTANDARD TSP DECISION PROBLEM:
INSTANCE: a complete undirected weighted graph G,
a positive integer k < n, and
a tour t of G
QUESTION: Is the given tour t of length < k?
(NP. In fact P)
*ANOTHER NONSTANDARD TSP DECISION PROBLEM:
INSTANCE: a complete undirected weighted graph G,
a tour t of G
QUESTION: Is t a tour of minimum length?
(Co-NP: there is a nifty proof when the answer is NO.)
*ANOTHER NONSTANDARD TSP DECISION PROBLEM:
INSTANCE: a complete undirected weighted graph G,
a tour t of G
QUESTION: Does G have a tour t' of shorter length than t,
ie. length(t') < length(t)?
(In NP)
Then give a formal definition of NP.
*Definition: a PROOF SYSTEM for a computational decision problem is a
system of logic such that:
1. if the answer to the given instance is YES, then there exists a
string that proves that the answer is YES; while
2. if the answer is NO, then no string proves that the answer is YES.
*Definition: NP is the class of decision problems for (each of) which
there exists a proof system -- a system of logic -- together with a
polynomial "poly" such that:
1. if the answer to the given instance of length n is YES, then there
exists a poly(n) long string that proves that the answer is YES; while
2. if the answer is NO, then no string (of any length) proves that the
answer is YES.
Then give more examples.
* a proof system for 3-colorability is an assignment of R,W,B,
to each vertex with the property that
1. for every 3-colorable graph, there is an assignment of one of the
three colors to each vertex such that no two adjacent vertices (vertices
joined by an edge) get the same color; while
2. if the graph is NOT 3-colorable, then there is no such coloring.
ie for every assignment of R, W, B to the vertices, some pair of adjacent
vertices will be colored the same.
* a proof system for Hamilton Cycle (HC) is a permutation of the n vertices
vi1, vi2, ..., vin such that 1.if the graph G has a HC, then at least one
such permutation is a cycle. 2.if G has no HC, then no such permutation
is a cycle.
Define the 3SAT problem:
INSTANCE: a set of m clauses on n variables. Each clause contains 3
literals. Each literal is either a variable or the complement of a variable.
QUESTION: is there an assignment of truth values to the variables such
that each clause contains a true literal?
EXAMPLE: {X, Y, Z'} {X, Z, W'} {X', Y, W} {Y', Z, W'}
(This is satisfiable under the assignment that maps every
variable to T. Why?)
EXAMPLE of a 2SAT problem:
{X', Y} {Y', Z} {Z', X} {X, Y} {X', Z'}
(This is NOT satisfiable. Why not?)
Def: A decision problem PI (given by INSTANCE and QUESTION) is NP-complete iff
1. PI is in NP.
2. PI is complete for all problems in NP. This means that there is a
poly-time function f from instances of 3SAT to instances of PI that maps
YES-instances of 3SAT to YES-instances of PI and NO-instances of 3SAT to
NO-instances of PI. A YES-instance of a problem is an instance for which
the correct answer is YES. Similarly, a NO-instance is an instance for
which the correct answer is NO.
The existence of a poly-time function f: 3SAT -> PI implies that a
poly-time algorithm for PI can be turned into a poly- time algorithm for
3SAT -- and therefore (still to be shown) into a poly-time algorithm for
any problem in NP.
EXAMPLE: The CLIQUE problem is NP-complete.
CLIQUE PROBLEM:
INSTANCE: a graph G = (V,E), and a positive integer k.
QUESTION: does G have a clique of size at least k?
A CLIQUE is a collection of vertices such that
every pair of vertices is joined by an edge.
NP-completeness is about EXPRESSIVENESS.