15-451 Algorithms 11/15/05
* Online algorithms
- rent-or-buy?
- elevator problem
- picking a winner Quiz Thurs (up through last class)
==========================================================================
Today's topic: Online Algorithms
Last time: looked at algorithms for finding approximately-optimal
solutions for NP-hard problems. Today: finding approximately-optimal
solutions for problems where difficulty is that the algorithm doesn't
have all the information up front.
Online algorithms: Algorithms for settings where inputs/data arriving
over time. Need to make decisions on the fly, without knowing what
will happen in the future. (As opposed to standard problems like
sorting where you have all inputs at the start.) Data structures are
one example of this. We'll talk about a few other examples today.
Rent-or-buy?
============
Simple online problem that captures a common issue in online
decision-making, called the rent-or-buy problem. Say you are just
starting to go skiing. Can either rent skis for $50 or buy for $500.
You don't know if you'll like it, so you decide to rent. Then you
decide to go again, and again, and after a while you realize you have
shelled out a lot of money renting and you wish you had bought right at the
start. Optimal strategy is: if you know you're going to end up skiing more
than 10 times, you should buy right at the beginning. If you know
you're going to go < 10 times, you should just rent. But, what if you
don't know?
To talk about quality of an online algorithm, we can look at what's
called the "competitive ratio":
Competitive ratio is worst case (maximum) over possible events of
the ratio: (alg cost)/OPT, where OPT = optimal cost in hindsight.
"cost" means total cost over all time.
E.g., what is CR of algorithm that says "buy right away"?
Worst case is: only go once. Ratio is 500/50 = 10.
What about algorithm that says "Rent forever"?
Worst case is: keep going skiing. Ratio is infinite.
Here's a nice strategy: rent until you realize you should have bought,
then buy. (In our case: rent 9 times, then buy).
If you went skiing 9 or less times, you're optimal.
If you went 10+ times, you paid $450 + $500. Opt paid $500.
Ratio = 2 - 1/10. (in general, if purchase cost is a multiple
of rental cost, the ratio is 2 - rent/buy)
The elevator problem
====================
You go up to the elevator and press the button. But who knows how
long it's going to take to come, if ever? How long should you wait
until you give up and take the stairs?
Say it takes time E to get to your floor by elevator (once it comes)
and it takes time S by stairs. E.g, maybe E = 15 sec, S = 45 sec.
What strategy has the best competitive ratio?
- wait 30 sec, then take stairs. (in general, wait for S-E time)
(I.e., take the stairs once you realize you should have taken
them at the start)
- if elevator comes in < 30 sec, we're optimal.
- otherwise, OPT = 45. We took 30+45 sec, so ratio = (30+45)/45 = 5/3
- Or, in general, ratio = (S-E+S)/S = 2 - E/S.
This is really the same as rent-or-buy where stairs=buy, waiting for E
time steps is like renting, and the elevator arriving is like the last
time you ever ski.
Claim: above strategy has the best possible competitive ratio for
deterministic algorithms.
Proof: if you wait longer, then if the elevator never comes, the
numerator goes up but the denominator stays the same, so your ratio is
worse. If you wait less (say (S-E) - alpha seconds) then the
numerator goes down by alpha, but if the elevator comes right after,
then the denominator goes down by alpha too. Again, the ratio is worse.
Other problems like this: whether it's worth optimizing code, when
your laptop should stop spinning the disk between accesses, and many others.
PICKING A WINNER (COMBINING EXPERT ADVICE)
=========================================
Here is a fun problem called the "picking a winner" problem. Also
called the problem of "combining expert advice".
THE SETTING: Say you have n buckets. You are standing in one of
them. At each time step,
- a ball falls into one of the buckets.
- If it's *your* bucket, then you get $1.
- Then, you can choose to move if you want to before the next time step.
Game ends when most full bucket has d balls (i.e., d dollars).
For instance, you can think of the buckets as companies you might
join, that might be doing well or not. Note that you only get money for
balls that fall into your bucket *while you are there*. Just because
company X did well in the past doesn't guarantee it will do well in
the future.
One problem with this game: if we are a deterministic algorithm, then
an adversary throwing the balls could make sure we never get any money
at all, by always putting the next ball into one of the buckets we're
not in. In other words, for any deterministic strategy, the worst
case is to get a gain of 0. So, like we've done for quicksort and
hashing what we'll do is look at *randomized* strategies. Question:
can we find a randomized strategy such that for any sequence of
events, our expected gain is high?
NOTE: You can think of this question as imagining that the future is
predetermined but we just don't know what it is. Sometimes called the
"oblivous adversary model". Or, in other words, the performance of a
company doesn't depend on whether we are there or not. Ideally, the
fact that you are at the company should only *help*...
Goal will be to get as close to $d as we can: do as close as we can to
best company in hindsight.
If want to think of these like stocks, or investment advisors, can
also look at a more general case, where in each time step some go up
and some go down, but we'll just look at the basic game as described above.
Let's look at some simple strategies first.
1. Pick a random company and stay there. What is the worst case
situation for that strategy? How much money does it make in
expectation? [d/n]
2. Go to the company that's done best so far. Break ties at random.
What is a bad situation for this strategy? [again, d/n]
Interestingly, it turns out there's a simple strategy with the
following amazing guarantee: for any sequence of events, its expected
gain is at least d - sqrt(2d log n).
E.g., d=100, n=100, get $70. d=1000,n=100, get $905.
So, nearly as good as best company in hindsight!
Motivation: in the bad case for alg #2 above, would have been better
to stay random. In fact, more generally, if the best one is close to
the average, then you might as well stay at a random one. But if the
best ones are a lot better than the average (like in bad case for alg
#1) you want to be randomizing over the best ones. Idea is to try to
make a smooth transition.
EXPONENTIAL-WEIGHTS ALGORITHM
=============================
Give each bucket a "weight" w_i starting at 1. Let W = w_1 + ... + w_n.
- Choose bucket with probability proportional to weight (i.e., w_i/W).
- When a ball falls into some bucket, multiply its weight by 1 + epsilon.
(we'll solve for epsilon later but for now think of like 0.1).
Analysis
========
Here's how the analysis is going to go. The total weights initially
sum to n. At the end of the game, they sum to at *least* (1+epsilon)^d
since that's the value of just the largest weight. What we'll show is
that increase in the weights is tied to the amount of probability we
have on the bucket the ball fell in. Use this to solve for our expected gain.
- Let W_t be total weight at time t. w_{i,t} = weight of bucket i at time t.
- Let p_t be our probability of being in the bucket that the ball fell
into at time t. So, p_t is our expected gain at time t (our probability
of getting $1). If ball fell into bucket i, p_t = w_{i,t}/W_t.
Remember now what the algorithm does: w_{i,t} increases by epsilon*w_{i,t}.
So, W_{t+1} = W_t + epsilon * w_{i,t} = W_t(1 + epsilon * p_t).
Notice: we've related the increase in weights to our probability of
having been in the right bucket. The only way the total weight can go
up by a large percentage is if this probability is large.
W_final = n*(1 + epsilon * p_1)(1 + epsilon * p_2)(1 + epsilon *p_3)...
>= (1+epsilon)^d.
Let's take ln (natural log) of both sides and do a little magic:
ln(n) + sum_t[ ln(1 + epsilon*p_t)] >= d*ln(1+epsilon).
Now, use the fact that ln(1+x) <= x [draw curve or remember Taylor]
ln(n) + sum_t[epsilon*p_t] >= d*ln(1+epsilon)
And using the fact that our expected gain = sum_t p_t, we have:
Our expected gain >= [d * ln(1+epsilon) - ln(n)]/epsilon.
Using the fact that ln(1+epsilon) >= epsilon - epsilon^2/2. (from
Taylor expansion), we can simplify this to:
Our expected gain >= [d(epsilon - epsilon^2/2) - ln(n)]/epsilon
>= d - d*epsilon/2 - ln(n)/epsilon.
Now, if we plug in epsilon to balance out the last two terms, we get
epsilon = sqrt(2*ln(n)/d),
which gives us:
Our expected gain >= d - 2*sqrt(d*ln(n)/2)
= d - sqrt(2d ln(n)).
So, this is pretty neat! In fact, a small generalization of this
allows you to handle the case that more than one bucket can get balls
in any given time step, or you have gains in [0,1] rather than just 0
or 1. What's interesting about this is that you can view this as an
adaptive algorithm for repeated play of a 2-player zero-sum game: the
buckets are the different rows, and at each time step the column
chosen by your opponent determines how much you would have won or lost
had you chosen each of the different rows. The conclusion we get here
is that your adaptive algorithm will do nearly as well as the best
fixed row (bucket) in hindsight. This in turn is at least as good as
the minimax value of the game. In fact, viewed in the right way, this
result gives a *proof* of the minimax theorem. We'll see this later...