15-854 Approximation and Online Algorithms	02/09/00

* tidying up some loose ends from last time
* The max-cut problem and Semidefinite Programming.

[Presentations: would like everyone to pick something by 1 wk from
today.]

[clarification on problem 5: All I'm looking for is a linear time
algorithm given that we've already mapped the real numbers to
rankings.  Getting the rankings from the real numbers can be done in
expected linear time w/ 2-level bucket sort]

=============================================================================

Last time we went through a 3-approximation to the shortest
superstring problem.  Just wanted to give an example to clear up one
of issues that got confusing in the discussion.

Remember the algorithm: we construct the prefix graph and then find
the optimal cycle cover.  (this is <= opt TSP tour <= opt superstring)
For a 4-approx, we just then open up each cycle arbitrarily and
concatenate.  For a 3-approx, replace "concatenate" with "merge using
greedy algorithm".

Notice that in the optimal cycle cover, we would never have a cycle
like this:       a      ba      b
             1------>2------>3----->1
(think of what the strings would have to look like)

On the other hand, we could have a cycle like this:
                ba      ba      b
             1------>2------>3----->1
in which case, string 3 might look like bbababbababbabab

Now, suppose this string overlapped in 10 characters (bababbabab) with
some string in a different cycle.  Then, that other cycle couldn't
have weight 5 (else it would be the same as this one).  Also, it
couldn't have weight < 5 since that would mean our cycle would need to
have a subperiod (only possible anyway for non-prime lengths but the
point is that then it wouldn't be minimum). In fact, in this case, the
shortest possible length is 7: bababba

This is basically the reasoning behind the key lemma we needed: in the
optimal cycle cover of the prefix graph, if string s1 is in cycle c1,
and string s2 is in cycle c2, then overlap(s1,s2) < weight(c1) + weight(c2). 

==============================================================================

Now, turn to new technique called semidefinite programming.  Much like
LP relaxed {0,1} values to [0,1] values, this will relax numbers to
vectors (or points) in an n-dimensional space.  Our optimization
problems will end up looking like various kinds of clustering
problems.  So it helps if you can think in n-dimensional space....

Today will talk about in context of the MAX-CUT problem.
==============================================================================

MAX-CUT:  Given a graph G, partition vertices into two sets S, T, to
maximize the number of edges between S and T.  

(E.g., if the graph was 2-colorable, then that would give you a
perfect cut from this point of view.  If not, then this is like asking
for the 2-coloring that gets the most edges correct, a lot like MAX
2-SAT.  In fact, techniques will carry over to that too....)

Here's a natural greedy algorithm: 
 - Start with any arbitrary cut.  
 - If some node has more neighbors on its side than on the other side,
   then move it to the other side.  Repeat.

Can you prove this won't just run forever?  Halts in O(m) steps.

Claim: at the end, at least half of the edges are crossing the cut. Why?

So, this is trivially a 1/2 approximation.

How about a really simple randomized algorithm?  Just put each node on
a random side.  Every edge has 1/2 prob of crossing cut, so expected
number crossing the cut is m/2.

Basically, this was the best known for a long time.....

Then, [Goemans & Williamson] showed how could use semidefinite
programming to do a lot better.

=============================================================================

What is Semidefinite programming?  Start with an operational
definition (what you can do) and then look at what's under the hood.

Operational definition: Semidefinite programming is like linear
programming, but your variables are vectors, and what you're allowed
to write down as constraints are linear inequalities on DOT-PRODUCTS
of these vectors.  (and you can also maximize or minimize an objective
function in this form too)

E.g., vectors a,b,c.  

Constraints: a.a = 1, b.b = 1, c.c = 1.  
             a.b <= 0, b.c <= 0, a.c <= 0.

What if we wanted to maximize a.b + b.c + c.a?  What if we wanted to
minimize it?


Notice: we're not allowed to specify that these vectors must live in a
2-dimensional space.  So, in general, their span could have as high a
dimensionality as the number of vectors.


Let's try to use for MAX-CUT.  We'll have one variable (vector) for
each node in the graph.  Let's require them to be unit vectors by
saying vi.vi = 1 for all i.  

Now, we want to put them into two clusters to maximize the number of
edges between the clusters.  Here's one way we can try to do that:

    maximize     SUM     0.5*(1 - u.v)
             (u,v) in E

E.g., if u=v then it contributes 0 to the sum.  If u = -v then it
contributes 0.5*2 = 1 to the sum, and if u is perpendicular to v then
it contributes 1/2 to the sum.

In particular, notice that if we could magically add the constraint
"all vectors must lie in a 1-dimensional space" (since they have
length 1, this is equivalent to saying that they can either be at +1
or at -1), then our objective function is EXACTLY EQUAL to the number
of edges crossing the cut.  Unfortunately, we can't.  So, much like an
LP relaxes {0,1} to fractional values, we are relaxing by allowing an
n-dimensional space.  So, the SDP might return a "better than optimal"
solution according to its objective, by using its freedom.  E.g., if
the graph is a triangle, then the max cut has value 2.  What would the
SDP return?  (equilateral triangle -> 9/4).

The difficulty with SDPs is that we have to somehow "round" these
vectors back to boolean values.  For the MAX-CUT problem, here's what
we'll do: 
  - pick a random hyperplane through the origin.
  - Let S = set of points on one side, and let T = set of points on
	the other. 

Claim: this gives an 0.878-approximation.  

==========================================================================

MAX-CUT Algorithm:
  - set up and solve the SDP.
  - Split into S and T with a random hyperplane through the origin.

2 things to do now.  (1) how does this SDP box really work.  (2) prove
the claim that this gives you an 0.878 approximation.  Let's do (2).
If time left, get back to (1) at the end.

Proof of claim:

First of all, given two vectors (u and v) that are separated by an
angle alpha, what is the probability they get split by a random
hyperplane?  Answer: alpha/pi.  Why?  Important point: intersection of
random hyperplane with the 2-d plane defined by u and v looks like a
random line (with probability 1).

So, can calculate expected value of our solution as a function of all
the pairwise angles.

    E[size of cut] =   SUM     angle(u,v)/pi
                    (u,v) in E

compare this to:

                 SUM     0.5*(1 - u.v) = OPT^*  > OPT
             (u,v) in E

So all we need to do is compare item by item.  If angle is alpha, then
u.v = cos(alpha).  Draw graph of alpha/pi for alpha in [0,pi] and
compare to 0.5(1 - cos(alpha)).  What you get is:

    For any angle alpha in [0, pi],  alpha/pi > 0.878(1 - cos(alpha))/2

So, our solution is at least 0.878 * OPT.


===========================================================================