15-451  Algorithms              11/16/04
* Online algorithms
  - rent-or-buy?
  - elevator problem			
  - picking a winner			   Quiz Thurs (up through last class)
==========================================================================
Today's topic: Online Algorithms

Online algorithms: Algorithms for settings where inputs/data arriving
over time.  Need to make decisions on the fly, without knowing what
will happen in the future.  (As opposed to standard problems like
sorting where you have all inputs at the start.)  Data structures are
one example of this.  We'll talk about a few other examples today.

Rent-or-buy?
============
Simple online problem that captures a common issue in online
decision-making, called the rent-or-buy problem.  Say you are just
starting to go skiing.  Can either rent skis for $50 or buy for $500.
You don't know if you'll like it, so you decide to rent.  Then you
decide to go again, and again, and after a while you realize you have
shelled out a lot of money renting and you wish you had bought right at the
start.  Optimal strategy is: if you're going to end up skiing more
than 10 times, you should buy right at the beginning.  If you're going
to go < 10 times, you should just rent. But, what if you don't know?

To talk about quality of an online algorithm, we can look at what's
called the "competitive ratio":

        Competitive ratio is worst case (maximum) over possible events of
        the ratio: (alg cost)/OPT, where OPT = optimal cost in hindsight.

        "cost" means total cost over all time.

E.g., what is CR of algorithm that says "buy right away"?
        Worst case is: only go once.  Ratio is 500/50 = 10.

What about algorithm that says "Rent forever"?
        Worst case is: keep going skiing. Ratio is infinite.

Here's a nice strategy: rent until you realize you should have bought,
then buy.  (In our case: rent 9 times, then buy).

        If you went skiing 9 or less times, you're optimal.
	If you went 10+ times, you paid $450 + $500.  Opt paid $500.
	Ratio = 2 - 1/10.  (in general, if purchase cost is a multiple
	of rental cost, the ratio is 2 - rent/buy)

The elevator problem
====================
You go up to the elevator and press the button.  But who knows how
long it's going to take to come, if ever?  How long should you wait
until you give up and take the stairs?

Say it takes time E to get to your floor by elevator (once it comes)
and it takes time S by stairs.  E.g, maybe E = 15 sec, S = 45 sec.

What strategy has the best competitive ratio?

     - wait 30 sec, then take stairs.  (in general, wait for S-E time)
       (I.e., take the stairs once you realize you should have taken
       them at the start)
     - if elevator comes in < 30 sec, we're optimal.
     - otherwise, OPT = 45. We took 30+45 sec, so ratio = (30+45)/45 = 5/3
     - Or, in general, ratio = (S-E+S)/S = 2 - E/S.

This is really the same as rent-or-buy where stairs=buy, waiting for E
time steps is like renting, and the elevator arriving is like the last
time you ever ski.

Claim: above strategy has the best possible competitive ratio for
deterministic algorithms. 

Proof: it's silly to wait longer: if elevator never comes, the
our cost only gets worse but OPT doesn't change.  If we wait less (say
(S-E) - alpha seconds) then the numerator goes down by alpha, but if the
elevator comes right after, then the denominator goes down by alpha
too.  So the ratio is worse.

Other problems like this: whether it's worth optimizing code, when
your laptop should stop disk from spinning between accesses, and many
others. 


PICKING A WINNER (COMBINING EXPERT ADVICE)
=========================================
Here is a fun problem called the "picking a winner" problem.  Also
called the problem of "combining expert advice".

THE SETTING: Say you have n buckets.  You are standing in one of
them.  At each time step, 
  - a ball falls into one of the buckets.  
  - If it's *your* bucket, then you get $1.  
  - Then,  you can choose to move if you want to before the next time step.  
Game ends when most full bucket has d balls (i.e., d dollars).

For instance, you can think of the buckets as companies you might
join, that might be doing well or not.  Note that you only get money for
balls that fall into your bucket *while you are there*.  Just because
company X did well in the past doesn't guarantee it will do well in
the future.

One problem with this game: if we are a deterministic algorithm, then
an adversary throwing the balls could make sure we never get any money
at all, by always putting the next ball into one of the buckets we're
not in.  In other words, for any deterministic strategy, the worst
case is to get a gain of 0.  So, like we've done for quicksort and
hashing what we'll do is look at *randomized* strategies.  Question:
can we find a randomized strategy such that for any sequence of
events, our expected gain is high?

NOTE: You can think of this question as imagining that the future is
predetermined but we just don't know what it is.  Sometimes called the
"oblivous adversary model".  Or, in other words, the performance of a
company doesn't depend on whether we are there or not.  Ideally, the
fact that you are at the company should only *help*...

Goal will be to get as close to $d as we can: do as close as we can to
best company in hindsight.  

If want to think of these like stocks, or investment advisors, can
also look at a more general case, where in each time step some go up
and some go down, but we'll just look at the basic game as described above.

Let's look at some simple strategies first.

1. Pick a random company and stay there.  What is the worst case
   situation for that strategy?  How much money does it make in
   expectation?    [d/n]

2. Go to the company that's done best so far.  Break ties at random.
   What is a bad situation for this strategy?  [again, d/n]

Interestingly, it turns out there's a simple strategy with the 
following amazing guarantee: for any sequence of events, its expected
gain is at least d - sqrt(2d log n).

E.g., d=100, n=100, get $70.  d=1000,n=100, get $905. 
So, nearly as good as best company in hindsight!

Motivation: in the bad case for alg #2 above, would have been better
to stay random.  In fact, more generally, if the best one is close to
the average, then you might as well stay at a random one.  But if the
best ones are a lot better than the average (like in bad case for alg
#1) you want to be randomizing over the best ones.  Idea is to try to
make a smooth transition.

EXPONENTIAL-WEIGHTS ALGORITHM
=============================
Give each bucket a "weight" w_i starting at 1.  Let W = w_1 + ... + w_n.

  - Choose bucket with probability proportional to weight (i.e., w_i/W).
  - When a ball falls into some bucket, multiply its weight by 1 + epsilon.
    (we'll solve for epsilon later but for now think of like 0.1).

Analysis
========
Here's how the analysis is going to go.  The total weights initially
sum to n.  At the end of the game, they sum to at *least* (1+epsilon)^d
since that's the value of just the largest weight.  What we'll show is
that increase in the weights is related to our expected gain. Use this
to solve for our gain.

- Let W_t be total weight at time t.  w_{i,t} = weight of bucket i at time t.
- Let p_t be our probability of being in the bucket that the ball fell
  into at time t.  So, p_t is our expected gain at time t (our probability 
  of getting $1).  If ball fell into bucket i, p_t = w_{i,t}/W_t. 

Remember now what the algorithm does: w_{i,t} increases by epsilon*w_{i,t}.

So, W_{t+1} = W_t + epsilon * w_{i,t} = W_t(1 + epsilon * p_t).

    W_final = n*(1 + epsilon * p_1)(1 + epsilon * p_2)(1 + epsilon *p_3)...
	    >= (1+epsilon)^d.

Let's take ln (natural log) of both sides and do a little magic:

      ln(n) + sum_t[ ln(1 + epsilon*p_t)] >= d*ln(1+epsilon).

Now, use the fact that ln(1+x) <= x  [draw curve or remember Taylor]

      ln(n) + sum_t[epsilon*p_t] >= d*ln(1+epsilon)

And using the fact that our expected gain = sum_t p_t, we have:

      Our expected gain >= [d * ln(1+epsilon) - ln(n)]/epsilon.

Using the fact that ln(1+epsilon) >= epsilon - epsilon^2/2.  (from
Taylor expansion), we can simplify this to:

      Our expected gain >= [d(epsilon - epsilon^2/2) - ln(n)]/epsilon
                        >= d  - d*epsilon/2 - ln(n)/epsilon.
Now, if we plug in epsilon to balance out the last two terms, we get
     epsilon = sqrt(2*ln(n)/d),
which gives us:
     Our expected gain >= d - 2*sqrt(d*ln(n)/2)
                        = d - sqrt(2d ln(n)).
So, this is pretty neat!