15-451 Algorithms 11/16/06 * Online algorithms - rent-or-buy? - elevator problem - repeated play of matrix games Quiz Tues (up through last class) ========================================================================== Today's topic: Online Algorithms Last time: looked at algorithms for finding approximately-optimal solutions for NP-hard problems. Today: finding approximately-optimal solutions for problems where difficulty is that the algorithm doesn't have all the information up front. Online algorithms: Algorithms for settings where inputs/data arriving over time. Need to make decisions on the fly, without knowing what will happen in the future. (As opposed to standard problems like sorting where you have all inputs at the start.) Data structures are one example of this. We'll talk about a few other examples today. Rent-or-buy? ============ Simple online problem that captures a common issue in online decision-making, called the rent-or-buy problem. Say you are just starting to go skiing. Can either rent skis for $50 or buy for $500. You don't know if you'll like it, so you decide to rent. Then you decide to go again, and again, and after a while you realize you have shelled out a lot of money renting and you wish you had bought right at the start. Optimal strategy is: if you know you're going to end up skiing more than 10 times, you should buy right at the beginning. If you know you're going to go < 10 times, you should just rent. But, what if you don't know? To talk about quality of an online algorithm, we can look at what's called the "competitive ratio": Competitive ratio is worst case (maximum) over possible events of the ratio: (alg cost)/OPT, where OPT = optimal cost in hindsight. "cost" means total cost over all time. E.g., what is CR of algorithm that says "buy right away"? Worst case is: only go once. Ratio is 500/50 = 10. What about algorithm that says "Rent forever"? Worst case is: keep going skiing. Ratio is infinite. Here's a nice strategy: rent until you realize you should have bought, then buy. (In our case: rent 9 times, then buy). Case 1: If you went skiing 9 or less times, you're optimal. Case 2: If you went 10+ times, you paid $450 + $500. Opt paid $500. Ratio = 2 - 1/10. In general, if purchase cost p is a multiple of rental cost r, the ratio is ((p-r)+p)/p = 2 - r/p. Worst of these is case 2, so competitive ratio is 2- r/p. Claim: above strategy has the best possible competitive ratio for deterministic algorithms. Proof: Consider the event that the day you purchase is the last day you go skiing. If you rent longer than the above strategy, then the numerator goes up but the denominator stays the same, so your ratio is worse. If you rent fewer times, then the numerator goes down by r but so does the denominator, so again the ratio is worse. The elevator problem ==================== You go up to the elevator and press the button. But who knows how long it's going to take to come, if ever? How long should you wait until you give up and take the stairs? Say it takes time E to get to your floor by elevator (once it comes) and it takes time S by stairs. E.g, maybe E = 15 sec, S = 45 sec. What strategy has the best competitive ratio? - wait 30 sec, then take stairs. (in general, wait for S-E time) (I.e., take the stairs once you realize you should have taken them at the start) - if elevator comes in < 30 sec, we're optimal. - otherwise, OPT = 45. We took 30+45 sec, so ratio = (30+45)/45 = 5/3 - Or, in general, ratio = (S-E+S)/S = 2 - E/S. This is really the same as rent-or-buy where stairs=buy, waiting for E time steps is like renting, and the elevator arriving is like the last time you ever ski. Other problems like this: whether it's worth optimizing code, when your laptop should stop spinning the disk between accesses, and many others. REPEATED PLAY OF MATRIX GAMES ============================= We talked earlier in class about matrix games, and the notion of minimax optimal strategies, and how linear programming can be used to solve for them. But what if we are playing against an opponent who is not optimal? Can we find a way to take advantage? What we're going to look at now is an adaptive algorithm for repeatedly playing some matix game against an opponent, that is guaranteed to approach (or exceed) the performance of the best fixed strategy in hindsight given the series of plays of the opponent. Given a series of rounds of us (row player) playing against some opponent (column player), define OPT to be performance of best fixed row in hindsight, given how the opponent played. Claim is we can get an algorithm whose performance will approach OPT. In game-theory terminology, we are doing nearly as well as the "best response to the opponent's empirical distribution". What's pretty impressive about this is: suppose we didn't know the minimax theorem was true. So, we imagine there might be some game where if the opponent had to pick his randomized strategy first and tell us what it was, we could guarantee getting v, but if we had to pick our randomized strategy first and tell him then we could only get some w = v - delta. Now, we run this adaptive algorithm in repeatedly playing against some opponent. OPT makes at least v per round on average, so we are approaching v. But, I've committed to my algorithm, so the opponent should be able to make sure I make at most w on average per round. This is a contradiction. So, the theorem we get actually gives a *proof* of the minimax theorem as a corollary. A couple applications: say each week you go food shopping and at the end you have to pick a line to stand in at the checkout counter, and maybe you have n strategies for picking which line will go fastest (or each week you bet on a football game and have n strategies for picking the winner). Each week you pick one of the strategies, and then you find out how you did and compare it to how well you would have done with each of the other strategies that day. Want to do nearly as well as best of those strategies in hindsight. To simplify the presentation, let's assume all entries in the matrix are 1 or 0 (win or lose). You can generalize but then the calculations get messier so let's not. We will think of our algorithm as the row-player. When the opponent plays some column, then we'll call rows with a 1 entry the "winning rows" and the others the "losing rows". E.g., some games to consider: 1 0 0 1 - or the n-by-n version (all zeros except for 1s on the diagonal). To get a feel for this, let's consider some strategies that don't work very well. Strategy 1: play 50/50. This is minimax optimal. But if the opponent always plays column 1, then when you look back in hindsight you see that row 1 would have won every time whereas you only won half the time. So, not horrible but you're off by a factor of 2. In the n-by-n version, the gap is worse (factor of n). Strategy 2: always choose the row that's done best in the past (break ties by picking the lowest index). Q: That does well against the previous opponent, but how could you as the column player totally defeat this strategy? A: alternate columns and make it lose every time. So, in T rounds, best row won T/2 times, but this algorithm won zero times. So that's even worse! Strategy 3: same as (2) but break ties at random. This does better in the bad example for strategy 2, but still not great. Every odd round it has a 1/2 chance of winning, and every even round it has a 0 chance of winning, so it's off by a factor of 2 (factor of n in n-by-n version). We will give an algorithm that does much better: for any n-by-n game, given a value epsilon ("learning rate"), it will get in expectation at least: OPT(1 - epsilon/2) - ln(n)/epsilon. E.g., n=100 rows, epsilon = 0.1. When OPT=1000, you get at least 904. Algorithm idea: just randomizing over the rows that did best in hindsight like #3 above is not enough randomness. Instead, want probability of choosing some row to degrade gracefully with distance from optimum. EXPONENTIAL-WEIGHTS ALGORITHM ============================= Give each row a "weight" w_i starting at 1. Let W = w_1 + ... + w_n. - Choose row with probability proportional to weight (i.e., w_i/W). - After the opponent chooses his column, take all the winning rows and multiply their weight by 1 + epsilon to determine the probability distribution for the next round. Analysis ======== Here's how the analysis is going to go. The total weights initially sum to n. At the end of the game, they sum to at *least* (1+epsilon)^OPT, where OPT is total score of the best row in hindsight, since that's the value of just the largest weight. What we'll show is that increase in the weights is tied to the probability we have of winning. - Let W_t be total weight at time t. w_{i,t} = weight of row i at time t. - Let p_t be our probability mass on the winning rows at time t. All winning rows i had weights multiplied by 1+epsilon. So, the total weight went up by p_t*W_t*epsilon. So, W_{t+1} = W_t + epsilon * p_t * W_t = W_t(1 + epsilon * p_t). Notice: we've related the increase in weights to our probability of winning. The only way the total weight can go up by a large percentage is if this probability is large. W_final = n*(1 + epsilon * p_1)(1 + epsilon * p_2)(1 + epsilon *p_3)... >= (1+epsilon)^OPT. Let's take ln (natural log) of both sides and do a little magic: ln(n) + sum_t[ ln(1 + epsilon*p_t)] >= OPT*ln(1+epsilon). Now, use the fact that ln(1+x) <= x [draw curve or remember Taylor] ln(n) + sum_t[epsilon*p_t] >= OPT*ln(1+epsilon) And using the fact that our expected gain = sum_t p_t, we have: Our expected gain >= [OPT * ln(1+epsilon) - ln(n)]/epsilon. Using the fact that ln(1+epsilon) >= epsilon - epsilon^2/2. (from Taylor expansion), we can simplify this to: Our expected gain >= [OPT(epsilon - epsilon^2/2) - ln(n)]/epsilon == OPT(1 - epsilon/2) - ln(n)/epsilon. QED (ta-da)