15-854 Approximation and Online Algorithms 04/03/00
* Online load balancing and virtual circuit routing.
========================================================================
Online Virtual-circuit routing (routing to minimize max congestion):
Online version of problem we looked at in approx side of course:
- Given (multi) graph G with m edges on n nodes.
- one at a time, given triple (x_t, y_t, b_t), which is a request for
a path from x_t to y_t (a virtual circuit) of bandwidth b_t.
- We are required to route each one as they come in.
- in the end, compare max edge congestion of our solution to OPT.
We'll show: can get ratio O(log m). What's neat about this is that
even for *offline* problem, all we could get with randomized rounding
of LP, which is the best known, is O(log(m) / loglog(m)).
Before looking at this problems, look at special case: load balancing
problem.
Load balancing on identical machines: (did this earlier in approx side
of course)
Given: m identical machines. Jobs arrive in order, each has
some known load b_t and must be scheduled immediately on one of the
machines. Goal is to minimize the max load on any machine.
This is a special case of the VCR problem where there are just two
nodes in the (multi)graph, with m edges between them.
Natural greedy strategy: put each job onto least loaded machine.
Classic analysis by Graham '66 shows this gets a 2 - 1/m Comp ratio.
(We did this earlier by worth recapping)
Proof: first of all, it does no better than this: if get m(m-1) jobs
of load 1 and then one of load m. Upper bound: look at the machine
where greedy has largest load. Say the last job placed on it had load
w, and before that the total load was s (so our cost is w + s). This
means the total sum of loads on all machines is at least ms + w. So,
OPT >= (ms + w)/m = s + w/m. Also, OPT >= w.
So, our cost <= OPT + w(1 - 1/m) <= OPT(2 - 1/m).
Surprisingly, it turns out this isn't optimal. You can do better by
explicitly maintaining a gap between the highest-loaded and
least-loaded machine, in case of an "emergency" of a large job
arriving). Deterministic algs: for large m, best known is 1.923. lower
bound is 1.852 (for m suff lg). Rand you can get 4/3 for m=2 (rather
than 3/2). rand lower bound is e/(e-1) = 1.58.... No better upper
bound than det known for large m.
Generalization: restricted machines model. Here, each job arrives
with a list of "legal machines" for that job, and you're only allowed
to choose one of them. Claim: no way to beat log(m) with det alg (can
prove similar bound for rand algs). Proof: all jobs of unit load.
Get m/2 jobs that can be scheduled anywhere. (Actually, it will be
easier if we make the online alg put them at m/2 distinct locations
--- can do by not allowing kth to be at positions of 1,...,k-1).
Then get m/4 that can only be scheduled on the m/2 spots used by the
first set of jobs (again, can force online to distribute), then m/8
etc. So, in the end there's some machine with O(log m) jobs. But,
offline can get max load of 1 by putting first set of m/2 on the
machines *not* used by the online alg, etc.
Even more general: unrelated machines model. Here, each job comes
with vector of loads, where b(i) is the load of performing the job on
machine i. (E.g., can model restricted machines setting by having
loads be b or infinity)
Virtual Circuit Routing
=======================
Our alg will get O(log m) even for "unrelated machines version".
I.e., for each request, bandwidth is a function from edges to reals:
if we route the tth request on edges e_1, e_2, e_3 from x_t to y_t,
then usage of e_1 goes up by b_t(e_1), usage of e_2 goes up by
b_t(e_2), usage of e_3 goes up by b_t(e_3) etc. The special case
where there are only two nodes corresponds to the unrelated machines
model, so clearly can't beat O(log m) and pretty surprising that we
can get it. Also nice since we can use these link-dependent usages
to model scenario where our links have different capacities and we're
counting the fraction used, or the case where some requests require a
kind of communication channel only available on parts of the network,
so some edges are just disallowed for those (like restricted machines
model).
Before giving alg, let's try two "strawman" algs.
Strawman #1: route to greedily minimize max congestion. Claim: if
graph is a circle, this can do as badly as Omega(n). all reqs of
bandwidth 1. get (0,1) then (0,1) then (1,2) then (1,2) then ...
Strawman #2: ignore congestion so far, and just route on shortest
path. Claim: can make this do really badly too.
Actual algorithm will combine aspects of each. Alg is a little weird
but simple.
First, to simplify, let's assume final OPT cost (max congestion) is
known. (else will be able to guess and double). Alg will look at
every request in units of OPT. I.e., can think of OPT = 1.
Algorithm defines a "potential" which is sum_e (3/2)^{load on e}.
(so, initially, potential = m.) Given a request, alg greedily routes
it in order to minimize the increase in the potential. That's it!
First question: how to implement this? Ans: Given bandwidth request
b, give each edge e a "length" of
(3/2)^{load on e + b(e)} - (3/2)^{load on e}.
Then we just want to find the shortest path.
What we'll prove is that the final potential at the end of all the
requests will be at most 2m. This means that the max congestion is
at most log_{3/2}(2m). Actually, it's even better than that since
other m-1 edges contribute at least 1, so we get congestion at most
log_{3/2}(m+1). Remember OPT=1 so we get our desired C.R.
We'll prove this by proving that the total *increase* in potential is
at most 1/2 of the final potential.
(I.e., PHI_final - m <= 1/2 PHI_final, which means PHI_final <= 2m)
Let's use t to index time, and say last request is at time T. Say
at time t, *we* use path P_t, and OPT uses P_t*
Total increase in potential is sum_e[(3/2)^{l_T(e)}] - m.
Can rewrite in terms of increase per step as:
sum_t sum_{e in P_t} [(3/2)^{l_{t-1}(e)} ((3/2)^{b_t(e)} - 1)].
Since we were greedy at each step, this is <=
sum_t sum_{e in P_t*}[(3/2)^{l_{t-1}(e)} ((3/2)^{b_t(e)} - 1)].
[Note, this is NOT the same as the total increase by OPT]
Now, we can group by edges instead of grouping by time to get:
sum_e sum_{t:e in P_t*}[(3/2)^{l_{t-1}(e)} ((3/2)^{b_t(e)} - 1)].
We can now simplify in two ways. First of all, l_{t-1}(e) is <=
l_T(e). So if we use this inequality we can pull that term out of
the inner sum. Second, since b_t(e)<=1 (since it's in OPT's path and
OPT's total max congestion is 1) we can use the inequality that
(3/2)^x - 1 <= x/2 for x between 0 and 1 ((draw picture)).
So, this is <=
(1/2) sum_e (3/2)^{l_T(e)} sum_{t:e in P_t*} b_t(e).
Now, we're done because the last term is OPT's congestion on edge e,
and this is at most 1, and the other term is just our final
potential. I.e., all this is <= PHI_final. That's what we wanted!!