15-854		4/19/00

The k-server problem
======================================
- Recall k-server problem: we control k "servers", each located at
some point in a metric space. Get a sequence of requests to points in
the space.  Each time we get a request, we need to move one of our
servers there.  Generalizes paging, weighted caching, and other problems.

- Will consider on finite metric space of n>>k points for simplicity
	(results don't depend on space being finite, but it simplifies the
	arguments) 

- Can view as MTS problem on {n \choose k} states.  In each task
vector, task costs are 0 or infinity. So, only paying movement costs.

- k-server conjecture: can you achieve C.R. of k?  O(k)?  poly(k)? f(k)?

- note, det MTS results no good since they scale like n^k.  Actually,
the randomized MTS alg gets close: poly(k*log(n)).

First result by Fiat-Rabani-Ravid: k^O(k).  [complicated alg]

Improvement by Grove: 2^k.  [harmonic alg]

Koutsoupias-Papadimitriou: 2k-1  [work-function alg]

We will go through argument in more recent paper by Koutsoupias, that
simplifies things a bit.   Big open question: can you use
randomization to get o(k)?
=========================================================================

Before going into WF alg, let's start with some simpler algs and see
why they don't work.

Strawman alg #1: greedy.  Always move the closest server.  What's a
bad example?

Strawman alg #2: OPT-chasing.  Compute opt cost in hindsight of ending
in each of the k possible configurations you get by moving one
server to the request.  Move the server that leads you to the one
that's smallest. This can do badly too.  
(k=3, 4 pts like this: **.................**) 

Work-function algorithm: Say you're in state X and consider the
configurations you get by moving one server to the request. Go to
state Y that minimizes OPT(Y) + d(X,Y).  I.e., you're minimizing the
sum of distance traveled and the OPT cost to end there.

[Note: we will use OPT(X) to denote the optimal cost of ending in
config X.  The papers use w(X) and call w the "work-function".  I'll
just call it OPT.  I will also say "state" and "configuration"
interchangeably] 

Let's recall MTS setting.  There we looked at the WF alg a little
differently.  Let's just check that they are the same thing.  There we
said "if the current task causes your state to become pinned, move to
a state that's pinning you and is not pinned by anyone else".
X is "pinned" by Y if OPT(X) = OPT(Y) + d(X,Y).  Our motivation was
that adversary could penalize X without increasing OPT values, so we'd
better not be there, and if we move in this way, then we can charge
off our distance moved to the increase in OPT. 

Why they are the same:  point is that since cost of servicing task in
X is infinity, we know X becomes pinned, and in fact it is pinned by
one of those k configurations.  [why?]  The argmin is the one pinning us.


The MTS view actually gives us a nice fact about the WF alg which will
be the starting point of the analysis.  Say at time i-1, the alg is in
state X.  Then we get request i, OPT(X) goes up by some amount
up_i, so we move to some new state Y.  Let "UP" be the sum of the up_i.
Since every time we move some distance d, we decrease OPT at our state
by d (by def of the algorithm), we know the alg's cost is <= UP.  In
fact, we can never get below the global OPT, so this means that 
UP >= alg + OPT.  So, all we need to do is prove that UP <= 2k*OPT + const
in order to get the 2k -1 bound.

----------

Simplification #1.  Instead of keeping track of the state of
the online alg, let's prove something harder, namely that

	sum_i max_X(OPT_i(X) - OPT_{i-1}(X)) <= 2k*OPT + const.

I.e., instead of using the increase in the state of the algorithm,
we're using the largest increase of OPT (the "work function") in that
time step.  The LHS here is sometimes called the "pseudocost" of the
algorithm.   This simplifies things since we no longer need to think
about the algorithm anymore.

----------

Simplification #2.  Which state X gives the max in the above
expression?  It turns out that by augmenting the space a bit, we can
get a handle on which state it is, as a function of just the current
request.  

Idea: we'll augment the metric space so that every point has an
"antipode".  Let D be the diameter of the space.  The antipode \bar{a}
of a point a has the property that: 
	1. d(a, \bar{a}) = D
	2. for all points b, d(a,b) + d(b,\bar{a}) = D.
	[yes, 2 implies 1]

For instance, think of a circle.  

How to augment?  Just create a new space \bar{M} that looks exactly
like the original space M, and set d(b,\bar{a}) as in the above
definition.

Let's just verify that we haven't violated anything.  Distances > 0.
Triangle inequality: d(a, \bar{b}) <= d(a,c) + d(c, \bar{b})?
RHS = d(a,c) + (D - d(c,b)) >= D - d(a,b)  [since d(a,c)+d(a,b) >= d(c,b)]

Also, make sure thigns are symmetric: d(a,b) = d(\bar{a},\bar{b}) by
design.  d(a,\bar{b}) = d(\bar{a},b) too.

Here's the claim: if the ith request is to some point r, then the max
increase of OPT occurs at the state X consisting of all k servers at
\bar{r}.

[We'll get back to proving or at least verifying some small examples
of this at the end.  For now, let's believe it and go on.]

So, let's use r_i to denote the ith request, and define X_i to be the
state consisting of all k servers at the point \bar{r_i}.  So, we've
simplified the problem to proving that:

	sum_i (OPT_i(X_i) - OPT_{i-1}(X_i)) <= 2k*OPT + const.

We'll prove for const = k^2 * D.

[note: can make proof work without need for finite number of points or
even bounded diameter, but this just simplifies our lives]

---------

Let's prove it.  Easier to view our goal this way:

	sum_i OPT_i(X_i) <= sum_i OPT_{i-1}(X_i) +  2k*OPT + (k^2 * D)

It's acutally now pretty simple.  Look at each OPT_i(X_i) separately.
Let's look at how the optimal in hindsight for the entire sequence
actually serviced requests.  Where did the server that served r_i go
next?  Two possibilities:

Possibility #1: nowhere.  r_i was a terminal point for that server.
In that case, we'll just use the simple fact that OPT_i(X_i) can't be
much more than the overall OPT in the end.  At worst, it's OPT + k*D.
Since there are at most k of these terminal values of i, this costs us
a total of k*OPT + k^2 * D. (That's where the additive constant and
half of the 2k comes from).

Possibility #2: it goes to r_j for some j>i.   In that case, we'll
bound OPT_i(X_i) by using the fact that 
	OPT_i(X_i) <= OPT_{j-1}(X_j) + k*d(r_i, r_j) 
This is just because one way of ending at X_i after serving the first
i requests is to actually serve even more requests (the first j-1,
where j-1 >= i) and end at X_j, and then move the state from X_j to
X_i.  Notice that d(X_j,X_i) = k*d(r_i,r_j).

This actually finishes it off since we've matched each OPT term on the
LHS to a different OPT term on the RHS, and the sum of all the
additive portions sum_i k*d(r_i,r_j) = k*OPT.

So, we're all done modulo that simplification we made.

[Note: seems like ought to be room for improvement with possibility #1
since we were so sloppy]

----------

Last part: why was simplification #2 OK?

--> do some examples.

--> comes from "quasiconvexity".  For any time i, any states X and Y,
we have: for any point x in X, there exists y in Y such that

	OPT_i(X) + OPT_i(Y) >= OPT_i(X - x + y) + OPT(Y + x - y).

--> why it's true

    - find an alternating path

--> how to use quasiconvexity