15-451 Algorithms 11/16/04 * Online algorithms - rent-or-buy? - elevator problem - picking a winner Quiz Thurs (up through last class) ========================================================================== Today's topic: Online Algorithms Online algorithms: Algorithms for settings where inputs/data arriving over time. Need to make decisions on the fly, without knowing what will happen in the future. (As opposed to standard problems like sorting where you have all inputs at the start.) Data structures are one example of this. We'll talk about a few other examples today. Rent-or-buy? ============ Simple online problem that captures a common issue in online decision-making, called the rent-or-buy problem. Say you are just starting to go skiing. Can either rent skis for $50 or buy for $500. You don't know if you'll like it, so you decide to rent. Then you decide to go again, and again, and after a while you realize you have shelled out a lot of money renting and you wish you had bought right at the start. Optimal strategy is: if you're going to end up skiing more than 10 times, you should buy right at the beginning. If you're going to go < 10 times, you should just rent. But, what if you don't know? To talk about quality of an online algorithm, we can look at what's called the "competitive ratio": Competitive ratio is worst case (maximum) over possible events of the ratio: (alg cost)/OPT, where OPT = optimal cost in hindsight. "cost" means total cost over all time. E.g., what is CR of algorithm that says "buy right away"? Worst case is: only go once. Ratio is 500/50 = 10. What about algorithm that says "Rent forever"? Worst case is: keep going skiing. Ratio is infinite. Here's a nice strategy: rent until you realize you should have bought, then buy. (In our case: rent 9 times, then buy). If you went skiing 9 or less times, you're optimal. If you went 10+ times, you paid $450 + $500. Opt paid $500. Ratio = 2 - 1/10. (in general, if purchase cost is a multiple of rental cost, the ratio is 2 - rent/buy) The elevator problem ==================== You go up to the elevator and press the button. But who knows how long it's going to take to come, if ever? How long should you wait until you give up and take the stairs? Say it takes time E to get to your floor by elevator (once it comes) and it takes time S by stairs. E.g, maybe E = 15 sec, S = 45 sec. What strategy has the best competitive ratio? - wait 30 sec, then take stairs. (in general, wait for S-E time) (I.e., take the stairs once you realize you should have taken them at the start) - if elevator comes in < 30 sec, we're optimal. - otherwise, OPT = 45. We took 30+45 sec, so ratio = (30+45)/45 = 5/3 - Or, in general, ratio = (S-E+S)/S = 2 - E/S. This is really the same as rent-or-buy where stairs=buy, waiting for E time steps is like renting, and the elevator arriving is like the last time you ever ski. Claim: above strategy has the best possible competitive ratio for deterministic algorithms. Proof: it's silly to wait longer: if elevator never comes, the our cost only gets worse but OPT doesn't change. If we wait less (say (S-E) - alpha seconds) then the numerator goes down by alpha, but if the elevator comes right after, then the denominator goes down by alpha too. So the ratio is worse. Other problems like this: whether it's worth optimizing code, when your laptop should stop disk from spinning between accesses, and many others. PICKING A WINNER (COMBINING EXPERT ADVICE) ========================================= Here is a fun problem called the "picking a winner" problem. Also called the problem of "combining expert advice". THE SETTING: Say you have n buckets. You are standing in one of them. At each time step, - a ball falls into one of the buckets. - If it's *your* bucket, then you get $1. - Then, you can choose to move if you want to before the next time step. Game ends when most full bucket has d balls (i.e., d dollars). For instance, you can think of the buckets as companies you might join, that might be doing well or not. Note that you only get money for balls that fall into your bucket *while you are there*. Just because company X did well in the past doesn't guarantee it will do well in the future. One problem with this game: if we are a deterministic algorithm, then an adversary throwing the balls could make sure we never get any money at all, by always putting the next ball into one of the buckets we're not in. In other words, for any deterministic strategy, the worst case is to get a gain of 0. So, like we've done for quicksort and hashing what we'll do is look at *randomized* strategies. Question: can we find a randomized strategy such that for any sequence of events, our expected gain is high? NOTE: You can think of this question as imagining that the future is predetermined but we just don't know what it is. Sometimes called the "oblivous adversary model". Or, in other words, the performance of a company doesn't depend on whether we are there or not. Ideally, the fact that you are at the company should only *help*... Goal will be to get as close to $d as we can: do as close as we can to best company in hindsight. If want to think of these like stocks, or investment advisors, can also look at a more general case, where in each time step some go up and some go down, but we'll just look at the basic game as described above. Let's look at some simple strategies first. 1. Pick a random company and stay there. What is the worst case situation for that strategy? How much money does it make in expectation? [d/n] 2. Go to the company that's done best so far. Break ties at random. What is a bad situation for this strategy? [again, d/n] Interestingly, it turns out there's a simple strategy with the following amazing guarantee: for any sequence of events, its expected gain is at least d - sqrt(2d log n). E.g., d=100, n=100, get $70. d=1000,n=100, get $905. So, nearly as good as best company in hindsight! Motivation: in the bad case for alg #2 above, would have been better to stay random. In fact, more generally, if the best one is close to the average, then you might as well stay at a random one. But if the best ones are a lot better than the average (like in bad case for alg #1) you want to be randomizing over the best ones. Idea is to try to make a smooth transition. EXPONENTIAL-WEIGHTS ALGORITHM ============================= Give each bucket a "weight" w_i starting at 1. Let W = w_1 + ... + w_n. - Choose bucket with probability proportional to weight (i.e., w_i/W). - When a ball falls into some bucket, multiply its weight by 1 + epsilon. (we'll solve for epsilon later but for now think of like 0.1). Analysis ======== Here's how the analysis is going to go. The total weights initially sum to n. At the end of the game, they sum to at *least* (1+epsilon)^d since that's the value of just the largest weight. What we'll show is that increase in the weights is related to our expected gain. Use this to solve for our gain. - Let W_t be total weight at time t. w_{i,t} = weight of bucket i at time t. - Let p_t be our probability of being in the bucket that the ball fell into at time t. So, p_t is our expected gain at time t (our probability of getting $1). If ball fell into bucket i, p_t = w_{i,t}/W_t. Remember now what the algorithm does: w_{i,t} increases by epsilon*w_{i,t}. So, W_{t+1} = W_t + epsilon * w_{i,t} = W_t(1 + epsilon * p_t). W_final = n*(1 + epsilon * p_1)(1 + epsilon * p_2)(1 + epsilon *p_3)... >= (1+epsilon)^d. Let's take ln (natural log) of both sides and do a little magic: ln(n) + sum_t[ ln(1 + epsilon*p_t)] >= d*ln(1+epsilon). Now, use the fact that ln(1+x) <= x [draw curve or remember Taylor] ln(n) + sum_t[epsilon*p_t] >= d*ln(1+epsilon) And using the fact that our expected gain = sum_t p_t, we have: Our expected gain >= [d * ln(1+epsilon) - ln(n)]/epsilon. Using the fact that ln(1+epsilon) >= epsilon - epsilon^2/2. (from Taylor expansion), we can simplify this to: Our expected gain >= [d(epsilon - epsilon^2/2) - ln(n)]/epsilon >= d - d*epsilon/2 - ln(n)/epsilon. Now, if we plug in epsilon to balance out the last two terms, we get epsilon = sqrt(2*ln(n)/d), which gives us: Our expected gain >= d - 2*sqrt(d*ln(n)/2) = d - sqrt(2d ln(n)). So, this is pretty neat!