15-854 Approximation and Online Algorithms 02/14/00 * SDP continued: graph coloring approx. ============================================================================ Last time we discussed use of Semidefinite Programming for approx to max cut. SDP: each variable is a vector. Allowed to specify linear constraints (and objective function) on the DOT PRODUCTS of vectors. [Under the hood: have matrix M of variables M_{ij}, where we think of M_{ij} as v_i \dot v_j. Constraints are linear in the variables. Then we require M to be positive semidefinite. This means we can decompose M into V V^t using Cholesky factorization, which gives us the vectors. The requirement that M be positive semidefinite turns out to be convex and have a separation oracle. In particular, given a proposed solution, we can try doing the Cholesky factorization and either we succeed, or else we can produce a linear constraint that is violated by M. For some LP algorithms like the Ellipsoid algorithm, this is enough. In the end, what we really get is a (1+epsilon) approx to the optimal solution...] The MAX-CUT problem was to split the graph into two pieces so that as many edges as possible cross the cut. The 3-coloring problem is similar: we want to partition into 3 pieces so that ALL edges go between the pieces. So, it's not surprising that SDPs might be useful here too. The notion of approximation is a little different though: we require the pieces to be independent sets and attempt to minimize the number of them (i.e., minimize the number of colors). Approximate 3-coloring ====================== History: For a while, best known was the sqrt(n) result on hwk1. Then this was improved to n^{0.4} and then to n^{3/8}. Then, when SDPs came out, it was realized they could be used to get n^{1/4}. Combining the SDP with the idea used in the n^{3/8} result gives n^{0.21} which is the best known to date. (These are ignoring polylog factors) Recall how the sqrt(n) result works: if a node has high degree (> sqrt(n)), you can 2-color its neighborhood, and then throw out the neighborhood and the colors and continue. Otherwise, if all nodes have low degree, you color everything left with d(G)+1 colors. The way the n^{0.4} and n^{3/8} results work is basically by doing something better in the high-degree case. So, if there are a lot of nodes with degree are > n^{0.4} then we'll do this new thing, otherwise we'll behave as in the above algorithm. The SDP works the other way, doing something better when the nodes have low degree. ============================================================================= Here's the basic idea of the n^{0.4} result. Say v is a node that's Red in the correct 3-coloring. Then, its neighborhood is blue and green. Say it's about half blue and half green. Now, lets look at the neighborhood of that set. The blues have red and green nbrs and the greens have red and blue nbrs. So, if things were relatively random, you'd expect N(N(v)) to be about 1/2 red, 1/4 blue, and 1/4 green. This is great because remember how we said it was possible to approx the vertex-cover problem to a factor of (2 - 1/log(n)). This set has a vertex cover of only half the nodes, so when we do the approx, we will still have an independent set of O(1/log n) of that set left over. The graph might not be "random" in this way, but what you can prove is that if you look at the edges leaving N(v) and group them into polylog(n) buckets appropriately, at least one of the buckets goes to a set that is nearly half red. To get final result, you start by getting rid of nodes of degree < n^{0.4} and there's another preprocessing step that makes progress whenever it finds 2 nodes that share > n^{0.2} nbrs. That then makes sure that the set you end up with is big enough -- softO(n^{0.6}) -- to make progress towards desired approx. ====================================================================== Now, let's do the SDP method. Each variable is a vector. We want ideally these vectors to be in 3 distinct clusters based on their color in the correct 3-coloring. Here's what we'll try: Each vertex is a unit vector: v.v = 1. If two vertices are neighbors, then vectors must be at least 120 degrees apart: v.w <= -0.5 Then, solve the SDP. Claim: if the graph is 3-colorable, then there is a feasible solution. (in fact, there is a feasible solution in 2d: just have 3 vectors at 120 degrees apart corresponding to red, blue, and green nodes). Now, given a feasible solution, need to round it back to a legal coloring. Idea 1: put k random hyperplanes through the origin. What is the probability that a given edge is *not* cut? Each plane cuts with probability at least 2/3, so the chance that it survives is at most (1/3)^k. Let's use k = 1 + log_3(d), where d is max degree of any node in the graph. So, prob that edge survives is <= 1/(3d). So, for any given node v, Pr(v is not separated from all neighbors) <= 1/3. So, the expected number of these "bad" nodes is at most n/3. So, with reasonable probability, there will be < n/2. (If not, just keep trying). So, there is an independent set of size at least (n/2)/2^k. Pull them out and repeat the entire process with the graph remaining. This will give us an O(2^k * log(n))-coloring. How good is this? If degrees are large, can 2-color neighborhoods like in the sqrt(n) algorithm. Let's do that until max degree gets down to d (which we'll solve for) and then switch over to SDP. We use at most n/d colors before switching, and then we use basically 2^{log_3 d} = d^{log_3 2} = d^{0.63}. Set that to equal n/d. Solves to d = n^{0.613} or n/d = n^{0.387}. So, better than n^{0.4} but not so good as n^{3/8}. To do better, we need a better way of rounding the SDP. ======================================================================= Rounding method #2: Here's a more natural way to do the rounding: pick t random points on the unit sphere. Assign each node to the closest point. Hope that for t not-too-large, at least a constant-fraction of the points are "good" (separated from all their neighbors). Unfortunately, this turns out to be painful to analyze, so we will modify slightly: (1) pick the t random points using an n-dimensional normal distribution. I.e., the probability density at some point of distance r from the origin is proportional to e^{-r^2 / 2}. [the "/2" part doesn't matter but that's how the "standard normal" is defined.] (2) assign each node to the point that maximizes the dot product with it. [(2) seems weird but makes things easier to analyze as we'll see in a sec] Claim: The probability u and v get assigned to the same center looks like 1/t^3. This means we need t = O(d^{1/3}) colors. Setting this to n/d gives us a worst case of d = n^{3/4} and an algorithm that uses n^{1/4} colors. [we're ignoring some log factors here] First, a few words about the n-dimensional gaussian distrib: the neat thing about this distribution is that you can choose the coordinates indepedently, each one using a 1-dim gaussian. Think about it: the prob density at some point (x1,x2,...,xn) at distance r will be proportional to e^{-x1^2 / 2}*...*e^{-xn^2 / 2} = e^{-r^2 / 2}. What's great about this is now let's look at what we need to prove. Pick two vectors u,v corresponding to neighbors in the graph, so they are 120 degrees apart. We want to calculate the probability that u and v are assigned to the same center. Because of the independence property of the n-dim gaussian, what happens in the 2-d space spanned by u and v is independent of what happens in the orthogonal space. Also, the rest of the dimensions don't affect the dot-product --- this is why "dot product" is easier to analyze than "distance". So, it all boils down to proving our claim in a 2-d world. Unfortunately, even in 2-d it still seems to be somewhat of a pain. Would be great to have a really clean argument. Just for reference, if u and v were 90 degrees apart, the analysis would be easier: if we pick t centers, there's exactly a 1/t chance that u and v get assigned to the same one. But, that's too big --- same as if we just randomly colored the graph with t colors. We want to prove the probability is more like 1/t^3. Here's the simplest argument I know. First of all, for the standard normal N(x), it's known that Pr(N(x) > r) is about N(r)/r. Let's call this P(r). The plan is to pick a value of r and prove that w.h.p., some center has dot product > r with u (and similarly some center has dot product > r with v) but it's unlikely that some center has dot product > r with both. Drawing out the picture....can see that having dot product > r with both means having length > 2r. So, we want r s.t. in t selections from the standard normal, whp we *do* find one of value > r but we *dont* find any of value > 2r. So, let's pick r so that P(r) = c(ln t)/t. Then Pr(we don't find a center with dot product > r with u) = (1 - P(r))^t = 1/t^c. Great - just pick c to be 3 or 4. Now, let's look at P(2r). From our formula and the form of the normal distrib, roughly P(2r) looks like P(r)^4. [this is a mess to calculate exactly, but just look at the main e^{-r^2/2} term]. So, this is like 1/t^4 (ignoring logs). So, the prob that *any* of our t draws is this large is at most 1/t^3 (ignoring logs). So, that's it!!