15-854 Approximation and Online Algorithms 04/03/00 * Online load balancing and virtual circuit routing. ======================================================================== Online Virtual-circuit routing (routing to minimize max congestion): Online version of problem we looked at in approx side of course: - Given (multi) graph G with m edges on n nodes. - one at a time, given triple (x_t, y_t, b_t), which is a request for a path from x_t to y_t (a virtual circuit) of bandwidth b_t. - We are required to route each one as they come in. - in the end, compare max edge congestion of our solution to OPT. We'll show: can get ratio O(log m). What's neat about this is that even for *offline* problem, all we could get with randomized rounding of LP, which is the best known, is O(log(m) / loglog(m)). Before looking at this problems, look at special case: load balancing problem. Load balancing on identical machines: (did this earlier in approx side of course) Given: m identical machines. Jobs arrive in order, each has some known load b_t and must be scheduled immediately on one of the machines. Goal is to minimize the max load on any machine. This is a special case of the VCR problem where there are just two nodes in the (multi)graph, with m edges between them. Natural greedy strategy: put each job onto least loaded machine. Classic analysis by Graham '66 shows this gets a 2 - 1/m Comp ratio. (We did this earlier by worth recapping) Proof: first of all, it does no better than this: if get m(m-1) jobs of load 1 and then one of load m. Upper bound: look at the machine where greedy has largest load. Say the last job placed on it had load w, and before that the total load was s (so our cost is w + s). This means the total sum of loads on all machines is at least ms + w. So, OPT >= (ms + w)/m = s + w/m. Also, OPT >= w. So, our cost <= OPT + w(1 - 1/m) <= OPT(2 - 1/m). Surprisingly, it turns out this isn't optimal. You can do better by explicitly maintaining a gap between the highest-loaded and least-loaded machine, in case of an "emergency" of a large job arriving). Deterministic algs: for large m, best known is 1.923. lower bound is 1.852 (for m suff lg). Rand you can get 4/3 for m=2 (rather than 3/2). rand lower bound is e/(e-1) = 1.58.... No better upper bound than det known for large m. Generalization: restricted machines model. Here, each job arrives with a list of "legal machines" for that job, and you're only allowed to choose one of them. Claim: no way to beat log(m) with det alg (can prove similar bound for rand algs). Proof: all jobs of unit load. Get m/2 jobs that can be scheduled anywhere. (Actually, it will be easier if we make the online alg put them at m/2 distinct locations --- can do by not allowing kth to be at positions of 1,...,k-1). Then get m/4 that can only be scheduled on the m/2 spots used by the first set of jobs (again, can force online to distribute), then m/8 etc. So, in the end there's some machine with O(log m) jobs. But, offline can get max load of 1 by putting first set of m/2 on the machines *not* used by the online alg, etc. Even more general: unrelated machines model. Here, each job comes with vector of loads, where b(i) is the load of performing the job on machine i. (E.g., can model restricted machines setting by having loads be b or infinity) Virtual Circuit Routing ======================= Our alg will get O(log m) even for "unrelated machines version". I.e., for each request, bandwidth is a function from edges to reals: if we route the tth request on edges e_1, e_2, e_3 from x_t to y_t, then usage of e_1 goes up by b_t(e_1), usage of e_2 goes up by b_t(e_2), usage of e_3 goes up by b_t(e_3) etc. The special case where there are only two nodes corresponds to the unrelated machines model, so clearly can't beat O(log m) and pretty surprising that we can get it. Also nice since we can use these link-dependent usages to model scenario where our links have different capacities and we're counting the fraction used, or the case where some requests require a kind of communication channel only available on parts of the network, so some edges are just disallowed for those (like restricted machines model). Before giving alg, let's try two "strawman" algs. Strawman #1: route to greedily minimize max congestion. Claim: if graph is a circle, this can do as badly as Omega(n). all reqs of bandwidth 1. get (0,1) then (0,1) then (1,2) then (1,2) then ... Strawman #2: ignore congestion so far, and just route on shortest path. Claim: can make this do really badly too. Actual algorithm will combine aspects of each. Alg is a little weird but simple. First, to simplify, let's assume final OPT cost (max congestion) is known. (else will be able to guess and double). Alg will look at every request in units of OPT. I.e., can think of OPT = 1. Algorithm defines a "potential" which is sum_e (3/2)^{load on e}. (so, initially, potential = m.) Given a request, alg greedily routes it in order to minimize the increase in the potential. That's it! First question: how to implement this? Ans: Given bandwidth request b, give each edge e a "length" of (3/2)^{load on e + b(e)} - (3/2)^{load on e}. Then we just want to find the shortest path. What we'll prove is that the final potential at the end of all the requests will be at most 2m. This means that the max congestion is at most log_{3/2}(2m). Actually, it's even better than that since other m-1 edges contribute at least 1, so we get congestion at most log_{3/2}(m+1). Remember OPT=1 so we get our desired C.R. We'll prove this by proving that the total *increase* in potential is at most 1/2 of the final potential. (I.e., PHI_final - m <= 1/2 PHI_final, which means PHI_final <= 2m) Let's use t to index time, and say last request is at time T. Say at time t, *we* use path P_t, and OPT uses P_t* Total increase in potential is sum_e[(3/2)^{l_T(e)}] - m. Can rewrite in terms of increase per step as: sum_t sum_{e in P_t} [(3/2)^{l_{t-1}(e)} ((3/2)^{b_t(e)} - 1)]. Since we were greedy at each step, this is <= sum_t sum_{e in P_t*}[(3/2)^{l_{t-1}(e)} ((3/2)^{b_t(e)} - 1)]. [Note, this is NOT the same as the total increase by OPT] Now, we can group by edges instead of grouping by time to get: sum_e sum_{t:e in P_t*}[(3/2)^{l_{t-1}(e)} ((3/2)^{b_t(e)} - 1)]. We can now simplify in two ways. First of all, l_{t-1}(e) is <= l_T(e). So if we use this inequality we can pull that term out of the inner sum. Second, since b_t(e)<=1 (since it's in OPT's path and OPT's total max congestion is 1) we can use the inequality that (3/2)^x - 1 <= x/2 for x between 0 and 1 ((draw picture)). So, this is <= (1/2) sum_e (3/2)^{l_T(e)} sum_{t:e in P_t*} b_t(e). Now, we're done because the last term is OPT's congestion on edge e, and this is at most 1, and the other term is just our final potential. I.e., all this is <= PHI_final. That's what we wanted!!