15-854 Approximation and Online Algorithms 2/23/00
* Embedding into L_1 [Bourgain's thm] and implications for sparsest
cut and balanced separators.
==============================================================================
[This is my last approx algs lecture. Ojas and Jochen will give one
on approx hardness on Mon, and if no other volunteers for early
presentation, we'll head onto online algs]
We've been looking at algs based on metric space approximation. Today
will do one more that is used for finding balanced separators in
graphs.
-[probably skip most of recap below]-
Recap last time: Gave randomized construction of Bartal that given any
metric space, produces a heiriarchical decomposition: rewrites the
space as a bunch of clusters far apart from each other, where each
cluster is composed of subclusters far from each other, and the
distance between two nodes depends on just what is the innermost
cluster they belong to. Cluster diameters drop by any desired factor
of k. Property is that for any (u,v), d_new(u,v) >= d_old(u,v), and
E[d_new(u,v)] <= O(k log^2 n)*d_old(u,v).
Idea of proof: pick arbitrary start node and flip a coin of bias p =
8lg(n)/n until it comes up heads to determine radius.
Can take the decomposition and convert to a tree. Kindof like a
recursive fedex network. For each cluster we designate a
clearinghouse node that collects all communication going out from
sub-clusters (either to nodes other subclusters or to nodes outside of
this cluster). E.g., if look at path between two nodes A,B in
different clusters at top level, can break into two halves: path from
A to root and path from B to root. For k-HST, each path has length at
most (D + D/k + D/k^2 + ...). So for k=2, each path has length at
most 2D so total from A to B is at most 4D.
Trees are great for a lot of algorithm problems since they remove the
need to make path decisions.
Buy-at-bulk: purchasing bandwidth with volume discount. E.g.,
BellSouth 1996: DS0 line 64k at $4/mile, DS1 line 1.5m at $23/mile,
DS3 45m at $200/mile (as reported in [Andrews-Zhang '98]). Easy to
solve on trees, so our metric space approx gives us an O(log^2 n)
approx (or O(log n loglogn) for latest). If have a single sink (e.g.,
connecting oil wells to a given oil refinery, or equivalent to saying
we're given a central root that everyone has to go through), then [AZ]
give O(k^2) bound where k = # of line types.
Minimum communication Spanning tree: Given a metric graph and a
probability distribution over pairs of nodes, find the spanning tree
that minimizes objective = expected distance between a random pair of nodes
drawn from that distribution. We get that for *any* given pair of
nodes (u,v), the expected distance has increase by only alpha:
for all (u,v), E_{our alg} [blowup in d(u,v)] <= alpha
which means
for random (u,v) E_{our alg} [blowup in d(u,v)] <= alpha
which means
E_{our alg} [blowup in objective] <= alpha.
============================================================================
Balanced separators:
There's something very similar to what we have just been looking at
that has lots of applications: balanced separators.
Given a graph G, want to split into two pieces S, V-S that minimizes
the ratio
r_S = (# edges between S and V-S) / (|S|*|V-S|)
This is sometimes called the "sparsest cut" or "minimum ratio cut".
Can also look at case where edges have weights, and then the numerator
is the total weight of edges in the cut. [Leighton-Rao '88] give an
O(log n) approximation algorithm for this problem. An implication of
this (and the way this is often stated) is that you can get a 1/3--2/3
separator (i.e, S has between 1/3 and 2/3 of the nodes) that has a
number of edges (or weight of the cut) that is within O(log n) of the
best 1/2--1/2 separator. [give this implication as a hwk problem?]
To do now: prove the Leighton-Rao result. We will do it via a metric
space approximation.
Theorem [Bourgain 86]: An n-point metric space can be embedded in an
L_1 space (manhattan distance) with distortion O(log n). The dimension
of the space will be O(log^2 n). Furthermore, there is an efficient
(randomized and now derandomized by LLR-95 and Garg-95) method for
doing it.
Let's first see how this solves our problem, then go back to prove
the theorem. Both parts are *tricky*.
Algorithm for getting a good separator
--------------------------------------
Step 1: Consider the following problem. Given a graph G, assign
lengths to the edges such that (1) if you look at all pairs of nodes
and the shortest path distance between them, the sum of all of these
is equal to 1, and (2) the sum of all edge lengths is minimum subject
to (1).
Claim: you can do this with linear programming. [have variables for
all pairwise distances, and force to have triangle inequality...]
Thing to notice: suppose we are given some cut (S, V-S) and we assign
distances according to: edges in the cut get length 1/(|S|*|V-S|), and all
others get 0. Then, we satisfy property (1) and the value of our
objective function is r_S. So, this is a fractional relaxation of our
problem. Say opt fractional is r'. [this is the sum of the edge lengths]
Step 2: We would really like this to be an L_1 metric. Bourgain's
theorem tells us we can have that, but now the sum of the edge lengths
might be an O(log n) factor larger than r'. So, we have distances on
the edges, where the sum of all pairwise shortest paths is 1, and the
sum of edge lengths is r'' = O(r' log n), AND the shortest path metric
is an L_1 metric. Now, why is this nice? Let's look at example of 4
pts in 2-d with cut-diamond graph.
Claim: one of the axis-parallel cuts C (there are at most n*dimension
of them) has r_C <= r''.
Proof: Let's label the axis parallel cuts C_1, C_2, ..., and their
corresponding gaps d_1, d_2, .... First of all, for a given pair of
nodes u,v, the distance in the graph between them is:
(do they cross C_1?)*d_1 + (do they cross C_2?)*d_2 + ...
So, the sum of all pairs of distances is:
(|C_1|*|V-C_1|)*d_1 + (|C_2|*|V-C_2|)*d_2 + ...
If we just add over neighbors in the graph we get that sum of edge lengths is
(# edges crossing C_1)*d_1 + (# edges crossing C_2)*d_2 + ...
The first summation is equal to 1 and the second is equal to r''. So,
one of the ratios has to be <= r''.
..whew..
OK, now on to Bourgain's theorem... this is also pretty tricky, but a
bit easier if you handwave a few details :-)
======================================================================
Proof of Bourgain's theorem: Let's assume n is a power of 2 for
simplicity.
First of all, here is the embedding. Given our metric space G, we're
going to pick O(log^2(n)) sets of nodes: S_1, S_2, .... Then we will
convert node v to the point:
[d(v,S_1), d(v,S_2), ... ]
where the distance between a point and a set is the distance to the
nearest point in the set. Here is how we pick the sets. Let q = O(log n)
We pick q random sets of size 1.
We pick q random sets of size 2.
We pick q random sets of size 4.
...
We pick q random sets of size n/2.
Let k = q*lg(n) be the number of sets we pick. The claim is that whp,
for every pair of nodes u,v, the new distance d_new satisfies:
Omega(q)*d(u,v) <= d_new(u,v) <= k*d(u,v).
The upper bound is easy and actually holds for certain. It just
follows from the fact that for any set S, |d(u,S) - d(v,S)| <= d(u,v),
because of the triangle inequality.
The hard part is the lower bound. Let's do some examples to get a
feel of what's going on.
Example 1: suppose in the original space, all points except u and v
are halfway between u and v: distance d(u,v)/2 from each. Then, most
sets S will be the same distance from u as they are from v, so they
won't contribute anything to d_new(u,v). But, our sets of size n/2
have a reasonable (constant) chance of including exactly one of u or
v. In that case, they contribute d(u,v)/2 to the distance. So, using
Chernoff, with high probability we get a distance of Omega(q*d(u,v)).
Example 2: suppose we have a complete binary tree of depth lg(n/2)
rooted at u, and another rooted at v, and then we identify the leaves
together. All edges have length 1, so d(u,v) = log n. In this case,
for every set size, the distance of the nearest point in the set to u
has a constant probability of being off by 1 from its expectation.
So, now the contribution from the different sizes is more equally
distributed, but again we get (from Chernoff) Omega(k) =
Omega(q*d(u,v)).
Now let's argue in general. Make a plot, for node u, of the number of
nodes within distance <= r from u, for r = 0 up to d(u,v)/2. Mark the
values of r where this curve reaches 1,2,4,8,16,... (some of these
marks may occur at the same value of r). To help us conceptually, put
marks that would have occurred at positions > d(u,v)/2 at d(u,v)/2.
Now, a set of size n/2^i to u has constant probability of having its
nearest point to u be >= the ith mark or <= the (i-1)st mark. [in a
sense, this is like what we did in hwk2]. Of course, for some values
of i, these two marks could be at the exact same place. BUT, since
there are only log(n) marks, if we pick i at random, the expected
distance between the (i-1)st and ith marks is d(u,v)/[2log(n)]. The
reason this is nice is that since we're only looking up to distance
d(u,v)/2 from u, what happens here is roughly independent of the same
game going on with respect to v. Modulo this (which technically one
would need to be more precise about), on average, each set set
contributes an expected O(d(u,v)/log n) to d_new(u,v), which gives us
what we want.