15-451 Algorithms 11/07/00
* 3-SAT & CIRCUIT-SAT
* NP-completeness: informal
* NP-completeness: formal
* More reductions
Quiz Thurs: MST, Dijkstra, Network flow, Linear Programming
============================================================================
In the last few classes, we've looked at increasingly more expressive
problems: network flow, min cost max flow, linear programming.
Informally, "expressive" means you can code up a lot of different
problems in their "language". So, by solving these well, we end up
having some important hammers we can use to solve other problems.
Today, start by introducing an even more expressive problem, 3-SAT.
The main difference is that (A) we don't know any fast way of solving
3-SAT, and (B) while 3-SAT looks simple, it is *so* expressive that we
view this as evidence that in fact it is intrinsically hard. It can
express anything in the class NP. Then we'll see a huge number of
other problems have exactly this same property.
[How many have seen NP-completeness before? Even if you have, this is
one of those topics where it's really important to develop your
intuitions, so you should still pay attention]
===============================================================
First a few definitions:
* We'll say that an algorithm runs in Polynomial Time if for some
constant c, its running time is O(n^c), where n is the size of the input.
"Size of input" means "number of bits it takes to write the input down"
* Problem A is poly-time REDUCIBLE to problem B (A <=_p B) if given a
poly-time alg for B, we can use it to produce a poly-time alg for A.
Problem A is poly-time EQUIVALENT to problem B (A =_p B) if
A <=_p B and B <=_p A.
Now, on to the problems:
SAT:
Given: a CNF formula (AND of ORs) over n variables x1,...,xn:
(x1 OR x2 OR not(x3)) AND (not(x2) OR x3) AND (x1 OR x3) AND ...
Goal: find an assignment to the variables that satisfies the formula
if one exists.
3-SAT:
Special case of SAT where each "clause" (OR) has at most 3 literals in
it. ("literal" = a variable or its negation). This is called "3-CNF".
DECISION vs SEARCH: when we get more formal, it will be easier to talk
about the decision (yes/no) version of these problems: "Does there
exist a satisfying assignment?" I claim these are poly-time
equivalent. Decision <= search is easy. What about the other way?
Set variables 1 at a time and use decision box to tell if we ever made
a mistake.
Fact 1: given alg for 3-SAT, can use to solve more impressive CIRCUIT-SAT.
(i.e., CIRCUIT-SAT <=_p 3-SAT)
CIRCUIT-SAT: Given a circuit of OR(x,y) and NOT(x) gates with a single
output and no loops (some of the inputs may be hardwired). Question:
is there a setting of the inputs that causes the circuit to output 1?
E.g., [draw some circuit]
Reducing CIRCUIT-SAT to 3-SAT: We will assign a variable to each of the
non-hardwired inputs, and one to each gate. We will then use clauses
to say that all internal gates have to produce the correct outputs
given their inputs, and that the output of the circuit is 1.
Rules: y = OR(x1,x2) is converted to
(x1 OR x2 OR not(y)) AND (not(x1) OR y) AND (not(x2) OR y)
y = not(x1) is converted to:
(x1 or y) AND (not(x1) or not(y))
Also, if y is the output of the circuit, put in (y).
We can see that the 3-CNF formula produced is satisfiable if and only
if the circuit has a setting of inputs that causes it to output
1. The size of the formula is polynomial (actually, linear) in the
size of the circuit. The construction can be done in polynomial
(actually, linear) time. So, if we had a polynomial-time algorithm to
solve 3-SAT, then we could solve circuit-SAT in poly time too.
What can we do with CIRCUIT-SAT? Well, let's take some hard problem
like factoring: Given a big number N written in binary, goal is to
find a factor x, 1 < x < N, if one exists. We don't know
any fast algorithm for this, but we can easily check a solution if
someone handed it to us.
Alg verify_factor(x,N)
if x <= 1 or x >= N return NO.
if ((N/x)*x == N) return YES. // integer division
else return NO
This runs in time polynomial in # of bits in x and N. Now, the key
fact is that given an algorithm that runs in time t, we can create a
circuit with polynomial in t many gates that implements it.
Basically, we just unroll what the hardware is doing: each level in
the circuit represents the contents of memory (can only affect t cells
in t steps) and the gates between one level and the next implement one
time step of the algorithm. We can do all this in time polynomial in
t, which is polynomial in log(N). Now what we do is we hardwire the
inputs corresponding to the bits of N. Then the problem "find a
setting of the remaining inputs that causes this circuit to output 1"
is the same as "find an input x such that verify_factor(x,N) outputs
YES?" Size of circuit is polynomial in log(N). So, if we had a
polynomial-time algorithm for circuit-SAT, we would have a
polynomial-time algorithm for factoring.
Notice, there was nothing particular to factoring here. Any problem
that has an algorithm to *verify* a solution in time t(n) can be
converted into a circuit of size polynomial(t). The problem of
finding a solution that makes the verifier happy is the same as the
problem of finding an input that makes the circuit output 1.
So, if we have a polynomial time algorithm to solve circuit-SAT, we
have a polynomial-time algorithm to solve any problem that has a
polynomial-time verifier. This is what it means to say that
circuit-SAT and 3-SAT are NP-complete.
====================================================================
Now, let's do it formally:
First of all, we'll be technically looking at decision problems.
E.g., consider the "Traveling Salesman Problem": given a graph G, find
the shortest tour that visits all nodes at least once. We would
convert this into a decision question "Given a graph G and a distance
d, does there exist a tour that visits all nodes and has length <= d?"
Claim: these are polynomial-time equivalent. The easy case is that
decision <= search. How to go in the other direction? First use
binary search to find optimal value. Then - delete edges so long as
doesn't change the answer.
COMPLEXITY CLASSES: P and NP.
P: problems solvable in polynomial time. E.g., network flow, LP, etc.
NP: problems that have polynomial-time verifiers. Specificially,
problem Q is in NP if there is a polynomial-time algorithm V(w,I) such that:
If "I" is a YES-instance, then there exists w such that V(w,I) = 1.
If "I" is a NO-instance, then for all w, V(w,I) = 0.
Furthermore, w should have length polynomial in size of I (since we
are really only giving V time polynomial in the size of the instance).
"w" is often called a "witness". E.g., consider 3-coloring. The
witness that an answer is YES is the coloring. The verifier just
checks that all edges are correct and that at most 3 colors are used.
So, 3-coloring is in NP. What about for TSP?
NP is like: "I might not know how to find it, but I'll know it when I see it"
(BTW, Co-NP is the class of problems that work the other way around).
It's pretty clear that P is contained in NP. Huge open question in
complexity theory is P=NP? It would be really weird if they are equal
so most people believe P != NP. But, it's very hard to prove that a
fast algorithm for something does NOT exist. So, it's still an open
problem.
--> What would a problem that's not even in NP look like? Here's a
problem that's believed to not be in NP: For the n by n version of
chess, does the first player have a winning strategy? (The answer is
not known for the usual 8x8 chess). Might be true --- but even if you
knew it was true, it's not clear how you could give a short proof.
NP-Completeness
===============
Problem Q is NP-complete if:
(1) Q is in NP, and
(2) Any other problem Q' in NP is polynomial-time reducible to Q.
So if you could solve Q in polynomial time, you could solve *any*
problem in NP in polynomial time. If Q just satisfies (2) then
it's called NP-hard.
Theorem: 3-SAT and Circuit-SAT are NP-complete.
IMPORTANT FACT: now that we have 3-SAT, in order to prove some other
NP problem Q is NP-complete, we just need to reduce 3-SAT to it.
I.e., show 3-SAT <= Q: if we could solve Q then we could solve 3-SAT.
MAKE SURE YOU UNDERSTAND THIS - A LOT OF PEOPLE MAKE THE MISTAKE OF
DOING IT THE OTHER WAY AROUND.
Theorem: Max-Clique (Does G have a clique of size > k?) is NP-hard.
Proof: reduce 3-SAT to MAX-CLIQUE.
Given a 3-CNF formula F of m clauses over n variables, we construct a
graph as follows. For each clause c of F we create one node for every
assignment to variables in c that satisfies c. E.g., say
F = (x1 OR x2 OR not(x4)) AND (not(x3) OR x4) AND (not(x2) OR not(x3)) AND ...
Then we create up to 7m nodes: for each clause of F of size 3 there
are 7 assignments to variables in it that satisfy the clause. (If the
clause has size 2 then there are 3 such assignments.) E.g., in this
case the nodes are:
(x1=0,x2=0,x4=0) (x3=0,x4=0) (x2=0,x3=0) ...
(x1=0,x2=1,x4=0) (x3=0,x4=1) (x2=0,x3=1)
(x1=0,x2=1,x4=1) (x3=1,x4=1) (x2=1,x3=0)
...
Then we put an edge between two nodes if the partial assignments are
consistent. Note: max possible clique size is m. And, if the 3-SaT
problem does have a satisfying assignment, then in fact there IS an
m-clique. Claim is this is true in the other direction too. If the
graph has an m-clique, then there is a satisfying assignment: namely,
just read off the assignment given in the nodes of the clique. So,
Max-Clique is NP-complete too.
Independent Set
===============
An independent set in a graph is a set of nodes no two of which have
an edge. E.g., in a 7-cycle, the largest independent set has size 3.
(E.g, in the graph coloring problem, the set of nodes colored red is
an independent set).
Theorem: Independent set (is there an Indep Set in G of size > k?) is
NP-complete.
Proof: Reduce from clique. Given graph G for clique problem, just
take complement of the graph . Ie. create graph H such that H has
edge (u,v) iff G does NOT have edge (u,v). Then H has an indep set of
size k iff G has a k-clique.
-------------------------------------------------------------------
Another example:
Vertex-Cover
============
A vertex cover in a graph is a set of nodes such that every edge is
incident to at least one. (e.g., look at cut-diamond graph). For
instance, can think of as what rooms to put security guards in a
museum. What we want is the smallest vertex cover.
Decision problem: does there exist a vertex cover of size < k?
Claim: if C is a vertex cover in a graph, then V-C is an independent
set. Also if S is an independent set, then V-S is a vertex cover.
So, to solve "is there an independent set of size > k?" just solve "is
there a vertex cover of size < n-k?". So, Vertex cover is NP-Complete
too.