15-451 Algorithms 11/16/06
* Online algorithms
- rent-or-buy?
- elevator problem
- repeated play of matrix games Quiz Tues (up through last class)
==========================================================================
Today's topic: Online Algorithms
Last time: looked at algorithms for finding approximately-optimal
solutions for NP-hard problems. Today: finding approximately-optimal
solutions for problems where difficulty is that the algorithm doesn't
have all the information up front.
Online algorithms: Algorithms for settings where inputs/data arriving
over time. Need to make decisions on the fly, without knowing what
will happen in the future. (As opposed to standard problems like
sorting where you have all inputs at the start.) Data structures are
one example of this. We'll talk about a few other examples today.
Rent-or-buy?
============
Simple online problem that captures a common issue in online
decision-making, called the rent-or-buy problem. Say you are just
starting to go skiing. Can either rent skis for $50 or buy for $500.
You don't know if you'll like it, so you decide to rent. Then you
decide to go again, and again, and after a while you realize you have
shelled out a lot of money renting and you wish you had bought right at the
start. Optimal strategy is: if you know you're going to end up skiing more
than 10 times, you should buy right at the beginning. If you know
you're going to go < 10 times, you should just rent. But, what if you
don't know?
To talk about quality of an online algorithm, we can look at what's
called the "competitive ratio":
Competitive ratio is worst case (maximum) over possible events of
the ratio: (alg cost)/OPT, where OPT = optimal cost in hindsight.
"cost" means total cost over all time.
E.g., what is CR of algorithm that says "buy right away"?
Worst case is: only go once. Ratio is 500/50 = 10.
What about algorithm that says "Rent forever"?
Worst case is: keep going skiing. Ratio is infinite.
Here's a nice strategy: rent until you realize you should have bought,
then buy. (In our case: rent 9 times, then buy).
Case 1: If you went skiing 9 or less times, you're optimal.
Case 2: If you went 10+ times, you paid $450 + $500. Opt paid $500.
Ratio = 2 - 1/10. In general, if purchase cost p is a multiple
of rental cost r, the ratio is ((p-r)+p)/p = 2 - r/p.
Worst of these is case 2, so competitive ratio is 2- r/p.
Claim: above strategy has the best possible competitive ratio for
deterministic algorithms.
Proof: Consider the event that the day you purchase is the last day
you go skiing. If you rent longer than the above strategy, then the
numerator goes up but the denominator stays the same, so your ratio is
worse. If you rent fewer times, then the numerator goes down by r but
so does the denominator, so again the ratio is worse.
The elevator problem
====================
You go up to the elevator and press the button. But who knows how
long it's going to take to come, if ever? How long should you wait
until you give up and take the stairs?
Say it takes time E to get to your floor by elevator (once it comes)
and it takes time S by stairs. E.g, maybe E = 15 sec, S = 45 sec.
What strategy has the best competitive ratio?
- wait 30 sec, then take stairs. (in general, wait for S-E time)
(I.e., take the stairs once you realize you should have taken
them at the start)
- if elevator comes in < 30 sec, we're optimal.
- otherwise, OPT = 45. We took 30+45 sec, so ratio = (30+45)/45 = 5/3
- Or, in general, ratio = (S-E+S)/S = 2 - E/S.
This is really the same as rent-or-buy where stairs=buy, waiting for E
time steps is like renting, and the elevator arriving is like the last
time you ever ski.
Other problems like this: whether it's worth optimizing code, when
your laptop should stop spinning the disk between accesses, and many others.
REPEATED PLAY OF MATRIX GAMES
=============================
We talked earlier in class about matrix games, and the notion of
minimax optimal strategies, and how linear programming can be used to
solve for them. But what if we are playing against an opponent who
is not optimal? Can we find a way to take advantage? What we're
going to look at now is an adaptive algorithm for repeatedly playing
some matix game against an opponent, that is guaranteed to approach (or
exceed) the performance of the best fixed strategy in hindsight given
the series of plays of the opponent.
Given a series of rounds of us (row player) playing against some
opponent (column player), define OPT to be performance of best fixed
row in hindsight, given how the opponent played. Claim is we can get
an algorithm whose performance will approach OPT. In game-theory
terminology, we are doing nearly as well as the "best response to the
opponent's empirical distribution".
What's pretty impressive about this is: suppose we didn't know the
minimax theorem was true. So, we imagine there might be some game
where if the opponent had to pick his randomized strategy first and
tell us what it was, we could guarantee getting v, but if we had to
pick our randomized strategy first and tell him then we could only get
some w = v - delta. Now, we run this adaptive algorithm in
repeatedly playing against some opponent. OPT makes at least v per
round on average, so we are approaching v. But, I've committed to my
algorithm, so the opponent should be able to make sure I make at most
w on average per round. This is a contradiction. So, the theorem we
get actually gives a *proof* of the minimax theorem as a corollary.
A couple applications: say each week you go food shopping and at the
end you have to pick a line to stand in at the checkout counter, and
maybe you have n strategies for picking which line will go fastest (or
each week you bet on a football game and have n strategies for picking
the winner). Each week you pick one of the strategies, and then you
find out how you did and compare it to how well you would have done
with each of the other strategies that day. Want to do nearly as well
as best of those strategies in hindsight.
To simplify the presentation, let's assume all entries in the matrix
are 1 or 0 (win or lose). You can generalize but then the
calculations get messier so let's not. We will think of our algorithm
as the row-player. When the opponent plays some column, then we'll
call rows with a 1 entry the "winning rows" and the others the "losing
rows". E.g., some games to consider:
1 0
0 1
- or the n-by-n version (all zeros except for 1s on the diagonal).
To get a feel for this, let's consider some strategies that don't work
very well.
Strategy 1: play 50/50. This is minimax optimal. But if the opponent
always plays column 1, then when you look back in hindsight you see
that row 1 would have won every time whereas you only won half the
time. So, not horrible but you're off by a factor of 2. In the
n-by-n version, the gap is worse (factor of n).
Strategy 2: always choose the row that's done best in the past (break
ties by picking the lowest index). Q: That does well against the
previous opponent, but how could you as the column player totally
defeat this strategy? A: alternate columns and make it lose every
time. So, in T rounds, best row won T/2 times, but this algorithm won
zero times. So that's even worse!
Strategy 3: same as (2) but break ties at random. This does better in
the bad example for strategy 2, but still not great. Every odd round
it has a 1/2 chance of winning, and every even round it has a 0 chance
of winning, so it's off by a factor of 2 (factor of n in n-by-n version).
We will give an algorithm that does much better: for any n-by-n game,
given a value epsilon ("learning rate"), it will get in expectation at
least:
OPT(1 - epsilon/2) - ln(n)/epsilon.
E.g., n=100 rows, epsilon = 0.1. When OPT=1000, you get at least 904.
Algorithm idea: just randomizing over the rows that did best in
hindsight like #3 above is not enough randomness. Instead, want
probability of choosing some row to degrade gracefully with distance
from optimum.
EXPONENTIAL-WEIGHTS ALGORITHM
=============================
Give each row a "weight" w_i starting at 1. Let W = w_1 + ... + w_n.
- Choose row with probability proportional to weight (i.e., w_i/W).
- After the opponent chooses his column, take all the winning rows and
multiply their weight by 1 + epsilon to determine the probability
distribution for the next round.
Analysis
========
Here's how the analysis is going to go. The total weights initially
sum to n. At the end of the game, they sum to at *least* (1+epsilon)^OPT,
where OPT is total score of the best row in hindsight, since that's
the value of just the largest weight. What we'll show is that
increase in the weights is tied to the probability we have of winning.
- Let W_t be total weight at time t. w_{i,t} = weight of row i at time t.
- Let p_t be our probability mass on the winning rows at time t.
All winning rows i had weights multiplied by 1+epsilon. So, the total
weight went up by p_t*W_t*epsilon.
So, W_{t+1} = W_t + epsilon * p_t * W_t = W_t(1 + epsilon * p_t).
Notice: we've related the increase in weights to our probability of
winning. The only way the total weight can go up by a large
percentage is if this probability is large.
W_final = n*(1 + epsilon * p_1)(1 + epsilon * p_2)(1 + epsilon *p_3)...
>= (1+epsilon)^OPT.
Let's take ln (natural log) of both sides and do a little magic:
ln(n) + sum_t[ ln(1 + epsilon*p_t)] >= OPT*ln(1+epsilon).
Now, use the fact that ln(1+x) <= x [draw curve or remember Taylor]
ln(n) + sum_t[epsilon*p_t] >= OPT*ln(1+epsilon)
And using the fact that our expected gain = sum_t p_t, we have:
Our expected gain >= [OPT * ln(1+epsilon) - ln(n)]/epsilon.
Using the fact that ln(1+epsilon) >= epsilon - epsilon^2/2. (from
Taylor expansion), we can simplify this to:
Our expected gain >= [OPT(epsilon - epsilon^2/2) - ln(n)]/epsilon
== OPT(1 - epsilon/2) - ln(n)/epsilon.
QED (ta-da)