15-854 Approximation and Online Algorithms 2/23/00 * Embedding into L_1 [Bourgain's thm] and implications for sparsest cut and balanced separators. ============================================================================== [This is my last approx algs lecture. Ojas and Jochen will give one on approx hardness on Mon, and if no other volunteers for early presentation, we'll head onto online algs] We've been looking at algs based on metric space approximation. Today will do one more that is used for finding balanced separators in graphs. -[probably skip most of recap below]- Recap last time: Gave randomized construction of Bartal that given any metric space, produces a heiriarchical decomposition: rewrites the space as a bunch of clusters far apart from each other, where each cluster is composed of subclusters far from each other, and the distance between two nodes depends on just what is the innermost cluster they belong to. Cluster diameters drop by any desired factor of k. Property is that for any (u,v), d_new(u,v) >= d_old(u,v), and E[d_new(u,v)] <= O(k log^2 n)*d_old(u,v). Idea of proof: pick arbitrary start node and flip a coin of bias p = 8lg(n)/n until it comes up heads to determine radius. Can take the decomposition and convert to a tree. Kindof like a recursive fedex network. For each cluster we designate a clearinghouse node that collects all communication going out from sub-clusters (either to nodes other subclusters or to nodes outside of this cluster). E.g., if look at path between two nodes A,B in different clusters at top level, can break into two halves: path from A to root and path from B to root. For k-HST, each path has length at most (D + D/k + D/k^2 + ...). So for k=2, each path has length at most 2D so total from A to B is at most 4D. Trees are great for a lot of algorithm problems since they remove the need to make path decisions. Buy-at-bulk: purchasing bandwidth with volume discount. E.g., BellSouth 1996: DS0 line 64k at \$4/mile, DS1 line 1.5m at \$23/mile, DS3 45m at \$200/mile (as reported in [Andrews-Zhang '98]). Easy to solve on trees, so our metric space approx gives us an O(log^2 n) approx (or O(log n loglogn) for latest). If have a single sink (e.g., connecting oil wells to a given oil refinery, or equivalent to saying we're given a central root that everyone has to go through), then [AZ] give O(k^2) bound where k = # of line types. Minimum communication Spanning tree: Given a metric graph and a probability distribution over pairs of nodes, find the spanning tree that minimizes objective = expected distance between a random pair of nodes drawn from that distribution. We get that for *any* given pair of nodes (u,v), the expected distance has increase by only alpha: for all (u,v), E_{our alg} [blowup in d(u,v)] <= alpha which means for random (u,v) E_{our alg} [blowup in d(u,v)] <= alpha which means E_{our alg} [blowup in objective] <= alpha. ============================================================================ Balanced separators: There's something very similar to what we have just been looking at that has lots of applications: balanced separators. Given a graph G, want to split into two pieces S, V-S that minimizes the ratio r_S = (# edges between S and V-S) / (|S|*|V-S|) This is sometimes called the "sparsest cut" or "minimum ratio cut". Can also look at case where edges have weights, and then the numerator is the total weight of edges in the cut. [Leighton-Rao '88] give an O(log n) approximation algorithm for this problem. An implication of this (and the way this is often stated) is that you can get a 1/3--2/3 separator (i.e, S has between 1/3 and 2/3 of the nodes) that has a number of edges (or weight of the cut) that is within O(log n) of the best 1/2--1/2 separator. [give this implication as a hwk problem?] To do now: prove the Leighton-Rao result. We will do it via a metric space approximation. Theorem [Bourgain 86]: An n-point metric space can be embedded in an L_1 space (manhattan distance) with distortion O(log n). The dimension of the space will be O(log^2 n). Furthermore, there is an efficient (randomized and now derandomized by LLR-95 and Garg-95) method for doing it. Let's first see how this solves our problem, then go back to prove the theorem. Both parts are *tricky*. Algorithm for getting a good separator -------------------------------------- Step 1: Consider the following problem. Given a graph G, assign lengths to the edges such that (1) if you look at all pairs of nodes and the shortest path distance between them, the sum of all of these is equal to 1, and (2) the sum of all edge lengths is minimum subject to (1). Claim: you can do this with linear programming. [have variables for all pairwise distances, and force to have triangle inequality...] Thing to notice: suppose we are given some cut (S, V-S) and we assign distances according to: edges in the cut get length 1/(|S|*|V-S|), and all others get 0. Then, we satisfy property (1) and the value of our objective function is r_S. So, this is a fractional relaxation of our problem. Say opt fractional is r'. [this is the sum of the edge lengths] Step 2: We would really like this to be an L_1 metric. Bourgain's theorem tells us we can have that, but now the sum of the edge lengths might be an O(log n) factor larger than r'. So, we have distances on the edges, where the sum of all pairwise shortest paths is 1, and the sum of edge lengths is r'' = O(r' log n), AND the shortest path metric is an L_1 metric. Now, why is this nice? Let's look at example of 4 pts in 2-d with cut-diamond graph. Claim: one of the axis-parallel cuts C (there are at most n*dimension of them) has r_C <= r''. Proof: Let's label the axis parallel cuts C_1, C_2, ..., and their corresponding gaps d_1, d_2, .... First of all, for a given pair of nodes u,v, the distance in the graph between them is: (do they cross C_1?)*d_1 + (do they cross C_2?)*d_2 + ... So, the sum of all pairs of distances is: (|C_1|*|V-C_1|)*d_1 + (|C_2|*|V-C_2|)*d_2 + ... If we just add over neighbors in the graph we get that sum of edge lengths is (# edges crossing C_1)*d_1 + (# edges crossing C_2)*d_2 + ... The first summation is equal to 1 and the second is equal to r''. So, one of the ratios has to be <= r''. ..whew.. OK, now on to Bourgain's theorem... this is also pretty tricky, but a bit easier if you handwave a few details :-) ====================================================================== Proof of Bourgain's theorem: Let's assume n is a power of 2 for simplicity. First of all, here is the embedding. Given our metric space G, we're going to pick O(log^2(n)) sets of nodes: S_1, S_2, .... Then we will convert node v to the point: [d(v,S_1), d(v,S_2), ... ] where the distance between a point and a set is the distance to the nearest point in the set. Here is how we pick the sets. Let q = O(log n) We pick q random sets of size 1. We pick q random sets of size 2. We pick q random sets of size 4. ... We pick q random sets of size n/2. Let k = q*lg(n) be the number of sets we pick. The claim is that whp, for every pair of nodes u,v, the new distance d_new satisfies: Omega(q)*d(u,v) <= d_new(u,v) <= k*d(u,v). The upper bound is easy and actually holds for certain. It just follows from the fact that for any set S, |d(u,S) - d(v,S)| <= d(u,v), because of the triangle inequality. The hard part is the lower bound. Let's do some examples to get a feel of what's going on. Example 1: suppose in the original space, all points except u and v are halfway between u and v: distance d(u,v)/2 from each. Then, most sets S will be the same distance from u as they are from v, so they won't contribute anything to d_new(u,v). But, our sets of size n/2 have a reasonable (constant) chance of including exactly one of u or v. In that case, they contribute d(u,v)/2 to the distance. So, using Chernoff, with high probability we get a distance of Omega(q*d(u,v)). Example 2: suppose we have a complete binary tree of depth lg(n/2) rooted at u, and another rooted at v, and then we identify the leaves together. All edges have length 1, so d(u,v) = log n. In this case, for every set size, the distance of the nearest point in the set to u has a constant probability of being off by 1 from its expectation. So, now the contribution from the different sizes is more equally distributed, but again we get (from Chernoff) Omega(k) = Omega(q*d(u,v)). Now let's argue in general. Make a plot, for node u, of the number of nodes within distance <= r from u, for r = 0 up to d(u,v)/2. Mark the values of r where this curve reaches 1,2,4,8,16,... (some of these marks may occur at the same value of r). To help us conceptually, put marks that would have occurred at positions > d(u,v)/2 at d(u,v)/2. Now, a set of size n/2^i to u has constant probability of having its nearest point to u be >= the ith mark or <= the (i-1)st mark. [in a sense, this is like what we did in hwk2]. Of course, for some values of i, these two marks could be at the exact same place. BUT, since there are only log(n) marks, if we pick i at random, the expected distance between the (i-1)st and ith marks is d(u,v)/[2log(n)]. The reason this is nice is that since we're only looking up to distance d(u,v)/2 from u, what happens here is roughly independent of the same game going on with respect to v. Modulo this (which technically one would need to be more precise about), on average, each set set contributes an expected O(d(u,v)/log n) to d_new(u,v), which gives us what we want.