4/23 15-682 Randomized Algorithms Randomized algorithms for on-line problems ============================================================================= On-line problems: get a sequence of requests and need to process online. Current decisions can affect future costs. Example of rent-or-buy problem: from time to time you get invited to soem formal occasion and need to either rent or buy a tuxedo. Say rental costs $60 and purchase cost is $600. If you knew that you would need a tux less than 10 times (before the styles change or you gain weight, etc) then it makes sense to rent. If you knew that you would need it more than 10 times, then it makes sense to buy right away. But what if you don't know? Competitive ratio of algorithm: worst-case ratio of alg's cost to the optimal cost in hindsight. A deterministic algorithm with best competitive ratio for the rent-or-buy problem is (assume buy cost is an integer multiple of rent cost): "rent so long as total spent is less than purchase cost, then buy". (e.g., for tuxedo problem, it is "rent 9 times then buy") Competitive ratio < 2: If you end up needing tux < 10 times, then you are optimal. If you end up needing it more than that, then opt is $600 and your cost is $540+$600. Formally, we'll allow an additive constant in our defn of competitive ratio. So, deterministic alg A has comp ratio r if there exists a constant b such that for all request sequences sigma, Cost_A(sigma) < r*(opt cost on sigma) + b. Here's a nice framework for many online problems: The Metrical Task System problem: You are controlling a machine that can be in one of n states. There is some distance metric d among the states of the machine. You given, online, a sequence of tasks. A task is a vector of costs, specifying how much it costs to process the task in each state of the machine. Given some task T, the algorithm may choose to process it in its current state i, paying T_i, or it may choose to move to some other state j and process it there, paying a total of T_j + d_{ij}. Some results: * For any n-state metric space, for any deterministic algorithm, there exist task sequences causing that algorithm to have a competitive ratio at least n. Proof: let the minimum inter-state distance in the metric space be 1. Each task will be a vector of the form (0000010000), where the "1" is in the algorithm's current state. So, the algorithm pays at least 1 per task for a total cost of t in t tasks. But, there must be some state that has at most t/n 1's in it. So, the optimal cost is at most t/n plus the diameter of the metric space. As t goes to infinity, the ratio goes to n. Now, let's look at randomized algorithms. To do this, we need to define the notion of competitive ratio for a randomized algorithm. Say that alg A has comp ratio r if there exists b such that for all request sequences sigma, E[Cost_A(sigma)] < r*(opt cost on sigma) + b. This is often called the "oblivious adversary" model since it corresponds to a setting where the request sequence is chosen FIRST, and then the algorithm makes its random choices. * For the uniform metric space, (d_ij = 1 for all i != j) you can achieve a ratio of O(log n) using the randomized "Marking" algorithm: The algorithm runs in phases. In each phase, - begin at a random state with all states "unmarked" - given some task T, mark all states whose total cost in this phase is at least 1. - if our state is marked, we move to a random unmarked state; if there are none, then we clear all the marks, move to a random state, and end the phase. If all task vectors are {0,1}^n valued, then the competitive ratio of this algorithm is H_n. More generally, the competitive ratio is at most 2H_n. (H_n = 1 + 1/2 + 1/3 + ... + 1/n). Proof: Let's for simplicity just look at task vectors of the form (00010000). It's easy to generalize from here. Initially our prob distribution is (1/n,1/n,...,1/n). Given a task such as (000000001), our prob dist changes to (1/(n-1),...,1/(n-1), 0). More generally, when the ith state is marked, our probability of moving is 1/(n-i+1) and afterwards we have probability 1/(n-i) in each of the unmarked states. So, our total expected movement cost is at most 1/n + 1/(n-1) + ... + 1/2 + 1 = H_n. * For the uniform metric space, every algorithm has competitive ratio at least Omega(log n). Proof: suppose each task is a random one of the form (000010000000). Then, the expected cost of any online algorithm per task is at least 1/n. But, the off-line algorithm, on each move, will go to the state whose next "expensive" task is farthest in the future. I.e., we are throwing balls randomly into boxes, and the offline algorithm moves only when every box finally has a ball. This is the "coupon collector's problem", and we've seen that the expected time is Theta(n log n). So, the offline expected cost per task is 1/(n log n). In the paging problem, we have n pages and a cache of size k. Given a page request, we incur a fault if the page is not in the cache, and we then have to pay 1 unit to bring the page into the cache and throw out one of the current cache occupants. The uniform MTS problem can model this problem when n=k+1. * More generally, the Marking algorithm achieves a ratio 2H_k for the paging problem, no matter what n is. (The marking algorithm becomes: mark each page in the cache as "recently used" or "not recently used". When a page is requested, mark the page as recently used. When we incur a page fault, throw out a random "not recently used" page from the cache. If all pages are marked as recently used, then end the phase and clear the marks.) Proof: see the book...