15-451 Algorithms		   11/07/06
* NP-completeness and expressiveness
* informal definitions
* formal definitions
* Circuit-SAT and 3-SAT
============================================================================
In the last few classes, we've looked at increasingly more expressive
problems:   network flow, min cost max flow, linear programming.

These problems have the property that you can code up a lot of different
problems in their "language".  So, by solving these well, we end up
having some important hammers we can use to solve other problems.

In fact, to talk about this a little more precisely, let's make the
following definitions:

* We'll say that an algorithm runs in Polynomial Time if for some
  constant c, its running time is O(n^c), where n is the size of the input.
  "Size of input" means "number of bits it takes to write the input down" 

* Problem A is poly-time REDUCIBLE to problem B (A <=_p B) if given a
  poly-time alg for B, we can use it to produce a poly-time alg for A.
  Problem A is poly-time EQUIVALENT to problem B (A =_p B) if 
  A <=_p B and B <=_p A.

For instance, we showed BIPARTITE MATCHING <=_p MAX FLOW
                         MIN-COST MAX-FLOW <=_p LINEAR PROGRAMMING
and more.
===============================================================

Let's start today by thinking about what would be a *really*
expressive problem, such that if we could solve it we could do all
sorts of things.  Here is a natural candidate:

The SHORT SOLUTION EXISTENCE PROBLEM: Given an algorithm V(I,X)
[written in some standard programming language, and think of it as
outputting either YES or NO], and given I and a bound B written in
unary (B bits).  Question: does there exist an input X such that
V(I,X) halts in at most B steps and outputs YES?

Why am I calling this the "short solution existence problem"?
Consider some problem we might want to solve like the
TRAVELING-SALESMAN-PROBLEM: given a graph G and an integer k, is there
a tour that visits all the nodes of G and has total length <= k?  We
don't know any fast ways of solving that problem, but we can easily
write a program V that given I = (G,k) and given a proposed solution X
verifies whether X indeed visits all the nodes of G and has total
length <= k.  Furthermore, this solution-verifier is linear time.  So,
if we could solve the "short solution existence problem", we could
tell if there is a X that makes V output YES and thereby solve the TSP.

What if we actually wanted to find the tour?  One way we could solve
that would be to start deleting edges from G and then re-running the
above procedure.  If the answer is "NO" then we put the edge back in.
If we do this for all edges, what's left in G will be just the tour
itself.

Let's try another problem.  Say we wanted to factor a big integer N.
We don't know any polynomial-time algorithms for solving that
problem.  But, we can easily write a verifier that given N and a
proposed factor X, tells us if X is a solution to the problem.

In fact, let's modify this slightly (you'll see why in a second) so
the verifier takes in an additional integer k (so I = (N,k)) and
outputs YES if X divides N *and* 1 < X < k.

So, if we can solve the short-solution-existence-problem, we can tell
if N has a factor between 2 and k-1 by feeding V and I=(N,k) into our
solver.   Then if we want to actually *find* a factor, we can do
binary search on k.  (That's why we needed the k).

In fact, we could use an algorithm for the "short solution existence
problem" to solve *any* problem for which we have a polynomial-time
algorithm for simply *checking* if a proposed solution is correct: we
just write down V and then make the length bound B big enough to
handle the running time of V.

Interestingly, the short-solution-existence-problem also belongs to
this same category.  Namely if someone hands us a proposed solution X,
we can check it by just running V.

WHAT WE NOW HAVE
================

This class of problems - problems for which we can efficiently verify
a proposed solution - is called NP.  A problem Q is said to be
NP-complete if (a) Q is in NP and (2) you could use a polynomial-time
algorithm for Q to solve *any* problem in NP in polynomial time.  That
is, for any Q' in NP, Q' <=_p Q.

And we have just proven our first NP-complete problem, namely the SSEP.

Now this problem seems pretty stylized.  But we can now show that
other simpler-looking problems have the property that if you could
solve them in polynomial-time, then you could solve this problem in
polynomial time, so they too are NP-complete.

So, the way to think of it is an NP-complete problem is
super-expressive.  It is so expressive, we believe there are no
polynomial-time algorithms for solving them.  In particular, if we
*could* solve an NP-complete problem in polynomial-time, then it would
mean that for any problem where we could *check* a proposed solution
efficiently, we could also *find* such a solution efficiently.

==========================================================
Now, onto formal definitions.

We will formally be considering decision problems: problems whose
answer is YES or NO.

COMPLEXITY CLASSES: P and NP.

P: problems solvable in polynomial time.  E.g., Given a network G and
a flow value f, does there exist a flow >= f?

NP: problems that have polynomial-time verifiers.  Specificially,
problem Q is in NP if there is a polynomial-time algorithm V(I,X) such that:

  If "I" is a YES-instance, then there exists X such that V(I,X) = 1.
  If "I" is a NO-instance, then for all X, V(I,X) = 0.

Furthermore, X should have length polynomial in size of I (since we
are really only giving V time polynomial in the size of the instance).

"X" is often called a "witness".  E.g., consider 3-coloring.  The
witness that an answer is YES is the coloring.  The verifier just
checks that all edges are correct and that at most 3 colors are used.
So, 3-coloring is in NP.

NP is like: "I might not know how to find it, but I'll know it when I see it"
(BTW, Co-NP is the class of problems that work the other way around).

It's pretty clear that P is contained in NP.  Huge open question in
complexity theory is P=NP?  It would be really weird if they are equal
so most people believe P != NP.  But, it's very hard to prove that a
fast algorithm for something does NOT exist.  So, it's still an open
problem.

NP-Completeness
===============
Problem Q is NP-complete if: 
 (1) Q is in NP, and 
 (2) Any other problem Q' in NP is polynomial-time reducible to Q.  

So if you could solve Q in polynomial time, you could solve *any*
problem in NP in polynomial time.  If Q just satisfies (2) then
it's called NP-hard.

Here's another NP-complete problem:

  CIRCUIT-SAT: Given a circuit of NAND gates with a single output and no
  loops (some of the inputs may be hardwired).  Question: is there a
  setting of the inputs that causes the circuit to output 1?

Proof Sketch:

First of all, it is clearly in NP, since you can just guess the input
and try it.  To show it is NP-complete, we just need to show that if
we could solve this, then we could solve the
short-solution-existence-problem.  Here's how.  Say we are given V, I,
B, and want to tell if there exists X such that V(I,X) outputs YES in
at most than B steps.  What we can do (and this part is a little
handwavy but in principle you did this in 251 when you built a
computer out of NAND gates) is we can effectively compile V into a
circuit of depth polynomial in B and the number of bits in I that
mimics what V does in it first B time steps.  We then hardwire the
inputs corresponding to I and feed this into our circuit-SAT solver.

OK, fine so we now have one more NP-complete problem.  But CIRCUIT-SAT
looks really complicated.  We weren't expecting to be able to solve
it.  But *now* we can show that a much simpler looking problem has the
property that if you could solve it efficiently, then you could solve
CIRCUIT-SAT efficiently.  ("efficiently" = "polynomial time").

This problem is 3-SAT.

3-SAT:
  Given: a CNF formula (AND of ORs) over n variables x1,...,xn, where
  each clause has at most 3 variables in it.
     (x1 OR x2 OR not(x3)) AND (not(x2) OR x3) AND (x1 OR x3) AND ...  

  Goal: find an assignment to the variables that satisfies the formula
  if one exists. 

Claim: we can reduce solving CIRCUIT-SAT to solving 3-SAT.  I.e., if
we can solve 3-SAT then we can solve CIRCUIT-SAT and so we can solve
all of NP. 

We'll do the reduction next time, but before we end, here is formally
how we are going to do reductions:

Say we have some problem A that we know is NP-complete.  We want to
show problem B is NP-complete too.  Well, first we show B is actually
in NP but that is usually the easy part.  The main thing we need to do
is show that any polynomial-time algorithm for B would give a
polynomial-time algorithm for A.  We do this by "reducing A to B", and
in particular what we want is this:

Reducing problem A to problem B
===============================
To reduce problem A to problem B we want a function f that takes
instances of A to instances of B such that:
(1) if x is a yes-instance of A then f(x) is a yes-instance of B
(2) if x is a no-instance of A then f(x) is a no-instance of B
(3) f can be computed in polynomial time.

So, if we had an algorithm for B, we could using it to solve A by
running it on f(x).