15-750 Graduate Algorithms 04/05/01
Approx algs III
* Semidefinite Programming
* Max-cut
* Max-2SAT (handwave)
=============================================================================
IMPORTANT HIGH LEVEL POINT: When we do a relaxation (whether it's an
LP relaxation or an SDP relaxation), our solution will be "better
than optimal". That's because we are giving ourselves more freedom,
and in the process we are in a sense "smoothing out" the problem which
is allowing us to solve for optimality. But, our solution isn't
legal. Then we will somehow "round" the solution we have to a legal
one, incurring a cost. The final statement we will make is that our
solution is within some factor c of the optimal solution to the
relaxation, which is at least as good as the solution to the original
problem. Therefore we are within a factor of c for the original
problem. All of our arguments have looked like this, but I just wanted
to make this explicit.....
Today: new technique called semidefinite programming. Much like
how LP relaxed {0,1} values to [0,1] values, this will relax numbers to
vectors (or points) in an n-dimensional space. Our optimization
problems will end up looking like various kinds of clustering
problems. So it helps if you can think in n-dimensional space....
We'll talk about this in the context of the MAX-CUT problem.
MAX-CUT: Given a graph G, partition vertices into two sets S, T, to
maximize the number of edges between S and T.
E.g., if the graph was 2-colorable, then that would give you a
perfect cut from this point of view. If not, then we're asking
for the 2-coloring that gets the most edges correct. Can think of
this as a version of MAX-2SAT, but where each clause is an XOR of
variables, rather than an OR of literals. In fact, the techniques we
will discuss apply to MAX-2SAT too. But MAX-CUT is a little cleaner.
Here's a natural greedy algorithm:
- Start with any arbitrary cut.
- If some node has more neighbors on its side than on the other side,
then move it to the other side. Repeat.
Can you prove this won't just run forever? Each step increases the
number of edges crossing the cut.
At the end, at least half of the edges are crossing the cut (Why?).
So, this is trivially a 1/2 approximation.
How about a really simple randomized algorithm? Just put each node on
a random side. Every edge has 1/2 prob of crossing cut, so expected
number crossing the cut is m/2.
Basically, this was the best known for a long time.....
Then, [Goemans & Williamson] showed how could use semidefinite
programming to do a lot better.
=============================================================================
What is Semidefinite programming? Start with an operational
definition (what you can do) and then look at what's under the hood.
Operational definition: Semidefinite programming is like linear
programming, but your variables are vectors, and what you're allowed
to write down as constraints are linear inequalities on DOT-PRODUCTS
of these vectors. (and you can also maximize or minimize an objective
function in this form too)
E.g., vectors a,b,c.
Constraints: a.a = 1, b.b = 1, c.c = 1.
a.b <= 0, b.c <= 0, a.c <= 0.
What if we wanted to maximize a.b + b.c + c.a? What if we wanted to
minimize it?
Notice: we're not allowed to specify that these vectors must live in a
2-dimensional space. So, in general, their span could have as high a
dimensionality as the number of vectors.
Let's try to use for MAX-CUT. We'll have one variable (vector) for
each node in the graph. Let's require them to be unit vectors by
saying vi.vi = 1 for all i.
Now, we want to put them into two clusters to maximize the number of
edges between the clusters. Here's one way we can try to do that:
maximize SUM 0.5*(1 - u.v)
(u,v) in E
E.g., if u=v then it contributes 0 to the sum. If u = -v then it
contributes 0.5*2 = 1 to the sum, and if u is perpendicular to v then
it contributes 1/2 to the sum.
In particular, notice that if we could magically add the constraint
"all vectors must lie in a 1-dimensional space" (since they have
length 1, this is equivalent to saying that they can either be at +1
or at -1), then our objective function is EXACTLY EQUAL to the number
of edges crossing the cut. Unfortunately, we can't. So, much like an
LP relaxes {0,1} to fractional values, we are relaxing by allowing an
n-dimensional space. So, the SDP might return a "better than optimal"
solution according to its objective, by using its freedom. E.g., if
the graph is a triangle, then the max cut has value 2. What would the
SDP return? (equilateral triangle -> 9/4).
The difficulty with SDPs is that we have to somehow "round" these
vectors back to boolean values. For the MAX-CUT problem, here's what
we'll do:
- pick a random hyperplane through the origin.
- Let S = set of points on one side, and let T = set of points on
the other.
Claim: this gives an 0.878-approximation.
==========================================================================
MAX-CUT Algorithm:
- set up and solve the SDP.
- Split into S and T with a random hyperplane through the origin.
2 things to do now. (1) how does this SDP box really work. (2) prove
the claim that this gives you an 0.878 approximation. Let's do (2).
If time left, get back to (1) at the end.
Proof of claim:
First of all, given two vectors (u and v) that are separated by an
angle alpha, what is the probability they get split by a random
hyperplane? Answer: alpha/pi. Why? Important point: intersection of
random hyperplane with the 2-d plane defined by u and v looks like a
random line (with probability 1).
So, can calculate expected value of our solution as a function of all
the pairwise angles.
E[size of cut] = SUM angle(u,v)/pi
(u,v) in E
compare this to:
SUM 0.5*(1 - u.v) = OPT^* > OPT
(u,v) in E
So all we need to do is compare item by item. If angle is alpha, then
u.v = cos(alpha). Draw graph of alpha/pi for alpha in [0,pi] and
compare to 0.5(1 - cos(alpha)). What you get is:
For any angle alpha in [0, pi], alpha/pi > 0.878(1 - cos(alpha))/2
So, our solution is at least 0.878 * OPT.
===========================================================================
--> Talk about MAX-2SAT
Rough idea. First of all, we want to distinguish between TRUE and
FALSE, so lets have one more unit vector y, which will stand for TRUE.
Then the main thing to notice is that say we have some clause like
(x1 v x2). How does the quantity x1.y + x2.y - x1.x2 behave, if we
just had the case that all the x's were restricted to the
1-dimensional line along y?
If x1 and x2 are both true, we get 1+1-1 = 1
If x1 is true but x2 is false, we get 1 - 1 + 1 = 1
If x1 is false but x2 is true, we get -1 + 1 + 1 = 1
If both are false we get -1-1-1 = -3.
So we can use this (after scaling) to create an appropriate SDP
objective function.
--> What's going on "under the hood".
Under the hood: have matrix M of variables M_{ij}, where we think of
M_{ij} as v_i \dot v_j. Constraints are linear in the variables.
Then we require M to be positive semidefinite. This means we can
decompose M into V V^t using Cholesky factorization, which gives us
the vectors. The requirement that M be positive semidefinite turns
out to be convex and have a separation oracle. In particular, given a
proposed solution, we can try doing the Cholesky factorization and
either we succeed, or else we can produce a linear constraint that is
violated by M. For some LP algorithms like the Ellipsoid algorithm,
this is enough.