15-854 Approximation and Online Algorithms 02/14/00
* SDP continued: graph coloring approx.
============================================================================
Last time we discussed use of Semidefinite Programming for approx to
max cut. SDP: each variable is a vector. Allowed to specify linear
constraints (and objective function) on the DOT PRODUCTS of vectors.
[Under the hood: have matrix M of variables M_{ij}, where we think of
M_{ij} as v_i \dot v_j. Constraints are linear in the variables.
Then we require M to be positive semidefinite. This means we can
decompose M into V V^t using Cholesky factorization, which gives us
the vectors. The requirement that M be positive semidefinite turns
out to be convex and have a separation oracle. In particular, given a
proposed solution, we can try doing the Cholesky factorization and
either we succeed, or else we can produce a linear constraint that is
violated by M. For some LP algorithms like the Ellipsoid algorithm,
this is enough. In the end, what we really get is a (1+epsilon)
approx to the optimal solution...]
The MAX-CUT problem was to split the graph into two pieces so that as
many edges as possible cross the cut. The 3-coloring problem is
similar: we want to partition into 3 pieces so that ALL edges go
between the pieces. So, it's not surprising that SDPs might be useful
here too. The notion of approximation is a little different though:
we require the pieces to be independent sets and attempt to minimize
the number of them (i.e., minimize the number of colors).
Approximate 3-coloring
======================
History:
For a while, best known was the sqrt(n) result on hwk1. Then this
was improved to n^{0.4} and then to n^{3/8}. Then, when SDPs came
out, it was realized they could be used to get n^{1/4}. Combining the
SDP with the idea used in the n^{3/8} result gives n^{0.21} which is
the best known to date. (These are ignoring polylog factors)
Recall how the sqrt(n) result works: if a node has high degree (>
sqrt(n)), you can 2-color its neighborhood, and then throw out the
neighborhood and the colors and continue. Otherwise, if all nodes
have low degree, you color everything left with d(G)+1 colors.
The way the n^{0.4} and n^{3/8} results work is basically by doing
something better in the high-degree case. So, if there are a lot of
nodes with degree are > n^{0.4} then we'll do this new thing,
otherwise we'll behave as in the above algorithm. The SDP works the
other way, doing something better when the nodes have low degree.
=============================================================================
Here's the basic idea of the n^{0.4} result. Say v is a node that's
Red in the correct 3-coloring. Then, its neighborhood is blue
and green. Say it's about half blue and half green. Now, lets look at
the neighborhood of that set. The blues have red and green nbrs and
the greens have red and blue nbrs. So, if things were relatively
random, you'd expect N(N(v)) to be about 1/2 red, 1/4 blue, and 1/4
green. This is great because remember how we said it was possible to
approx the vertex-cover problem to a factor of (2 - 1/log(n)). This
set has a vertex cover of only half the nodes, so when we do the
approx, we will still have an independent set of O(1/log n) of that
set left over.
The graph might not be "random" in this way, but what you can prove is
that if you look at the edges leaving N(v) and group them into
polylog(n) buckets appropriately, at least one of the buckets goes to
a set that is nearly half red.
To get final result, you start by getting rid of nodes of degree <
n^{0.4} and there's another preprocessing step that makes progress
whenever it finds 2 nodes that share > n^{0.2} nbrs. That then makes
sure that the set you end up with is big enough -- softO(n^{0.6}) --
to make progress towards desired approx.
======================================================================
Now, let's do the SDP method.
Each variable is a vector. We want ideally these vectors to be in 3
distinct clusters based on their color in the correct 3-coloring.
Here's what we'll try:
Each vertex is a unit vector: v.v = 1.
If two vertices are neighbors, then vectors must be at least
120 degrees apart: v.w <= -0.5
Then, solve the SDP.
Claim: if the graph is 3-colorable, then there is a feasible
solution. (in fact, there is a feasible solution in 2d: just have 3
vectors at 120 degrees apart corresponding to red, blue, and green
nodes).
Now, given a feasible solution, need to round it back to a legal
coloring.
Idea 1: put k random hyperplanes through the origin. What is the
probability that a given edge is *not* cut? Each plane cuts with
probability at least 2/3, so the chance that it survives is at most
(1/3)^k. Let's use k = 1 + log_3(d), where d is max degree of any
node in the graph. So, prob that edge survives is <= 1/(3d). So, for
any given node v, Pr(v is not separated from all neighbors) <= 1/3.
So, the expected number of these "bad" nodes is at most n/3. So, with
reasonable probability, there will be < n/2. (If not, just keep
trying). So, there is an independent set of size at least (n/2)/2^k.
Pull them out and repeat the entire process with the graph remaining.
This will give us an O(2^k * log(n))-coloring.
How good is this? If degrees are large, can 2-color neighborhoods like in the
sqrt(n) algorithm. Let's do that until max degree gets down to d
(which we'll solve for) and then switch over to SDP. We use at most
n/d colors before switching, and then we use basically 2^{log_3 d} =
d^{log_3 2} = d^{0.63}. Set that to equal n/d. Solves to d =
n^{0.613} or n/d = n^{0.387}.
So, better than n^{0.4} but not so good as n^{3/8}.
To do better, we need a better way of rounding the SDP.
=======================================================================
Rounding method #2:
Here's a more natural way to do the rounding: pick t random points on
the unit sphere. Assign each node to the closest point. Hope that
for t not-too-large, at least a constant-fraction of the points are
"good" (separated from all their neighbors).
Unfortunately, this turns out to be painful to analyze, so we will
modify slightly: (1) pick the t random points using an n-dimensional
normal distribution. I.e., the probability density at some point of
distance r from the origin is proportional to e^{-r^2 / 2}. [the "/2"
part doesn't matter but that's how the "standard normal" is defined.]
(2) assign each node to the point that maximizes the dot product with
it. [(2) seems weird but makes things easier to analyze as we'll see
in a sec]
Claim: The probability u and v get assigned to the same center looks
like 1/t^3. This means we need t = O(d^{1/3}) colors. Setting this to
n/d gives us a worst case of d = n^{3/4} and an algorithm that uses
n^{1/4} colors. [we're ignoring some log factors here]
First, a few words about the n-dimensional gaussian distrib: the neat
thing about this distribution is that you can choose the coordinates
indepedently, each one using a 1-dim gaussian. Think about it: the
prob density at some point (x1,x2,...,xn) at distance r will be
proportional to e^{-x1^2 / 2}*...*e^{-xn^2 / 2} = e^{-r^2 / 2}.
What's great about this is now let's look at what we need to prove.
Pick two vectors u,v corresponding to neighbors
in the graph, so they are 120 degrees apart. We want to calculate the
probability that u and v are assigned to the same center. Because of
the independence property of the n-dim gaussian, what happens in the
2-d space spanned by u and v is independent of what happens in the
orthogonal space. Also, the rest of the dimensions don't affect the
dot-product --- this is why "dot product" is easier to analyze than
"distance". So, it all boils down to proving our claim in a 2-d world.
Unfortunately, even in 2-d it still seems to be somewhat of a pain.
Would be great to have a really clean argument. Just for reference,
if u and v were 90 degrees apart, the analysis would be easier: if we
pick t centers, there's exactly a 1/t chance that u and v get assigned
to the same one. But, that's too big --- same as if we just randomly
colored the graph with t colors. We want to prove the probability is
more like 1/t^3.
Here's the simplest argument I know. First of all, for the standard
normal N(x), it's known that Pr(N(x) > r) is about N(r)/r. Let's
call this P(r). The plan is to pick a value of r and prove that
w.h.p., some center has dot product > r with u (and similarly some
center has dot product > r with v) but it's unlikely that some center
has dot product > r with both. Drawing out the picture....can see
that having dot product > r with both means having length > 2r. So,
we want r s.t. in t selections from the standard normal, whp we *do*
find one of value > r but we *dont* find any of value > 2r.
So, let's pick r so that P(r) = c(ln t)/t. Then Pr(we don't find a center
with dot product > r with u) = (1 - P(r))^t = 1/t^c. Great - just
pick c to be 3 or 4. Now, let's look at P(2r). From our formula and
the form of the normal distrib, roughly P(2r) looks like P(r)^4. [this
is a mess to calculate exactly, but just look at the main e^{-r^2/2}
term]. So, this is like 1/t^4 (ignoring logs). So, the prob that
*any* of our t draws is this large is at most 1/t^3 (ignoring logs).
So, that's it!!