15-854 Approximation and Online Algorithms 01/19/00 http://www.cs.cmu.edu/~avrim/Approx00/ * Overview Handouts: * Admin - Course info * Approx example: Vertex Cover - these notes * Online examples: rent or buy, pursuer-evader ========================================================================== Overview: - What are approximation algorithms, what are online algorithms, and what do they have in common? - Approx: have a problem that is too hard computationally to solve exactly. Can we efficiently (e.g., in polynomial time) guarantee a good approximation. E.g., Vertex cover <= 2 * optimal. Or, MAX-SAT: get >= 3/4 * optimal. - Online: For problems where only see inputs one at a time, can we get close to best possible in hindsight? E.g., # page faults <= log(n) * optimal. - Both concerned with how close can we get to an ideal optimum, for problems where (for different reasons) getting the optimum may not be possible. - Historically, approx algs are older. Online algs came more recently out of study of adaptive data structures and amortized analysis. - Not so much overlap historically in terms of techniques, but this is changing. E.g., Bartal's metric space approx - developed in online setting but useful also for approx algs. Admin: - See course info sheet. VERTEX COVER ============ - What is the fewest # of guards we need to place in museum to cover all the corridors. - Formally, given a graph G, looking for smallest set of vertices such that every edge is incident to at least one vertex in the set. - Example: +----+----+ | | | +----+----+ - This problem is NP-hard. But it turns out it's easy to get within a factor of 2. Alg1: Pick arbitrary edge. We know any VC must have at least 1 endpt, so lets take both. Then throw out all edges covered and repeat. Equivalently: find a maximal matching (doesn't have to be *maximum*, just *maximal*), then for each edge, put both endpoints into our cover. Note that this is a VC (otherwise matching wouldn't be maximal). Also any VC must have at least one endpoint of each of these edges. Therefore, we're off by at most a factor of 2. This is a little tricky: why take both endpoints? How about if we just take one? This seems less wasteful. Equivalently: pick arbitrary vertex, put into VC, delete edges covered, and repeat. Does this guarantee a factor of 2? What's a bad example? So, this is a 2-approximation. GENERAL NOTATION ================ In general, for minimization problems (e.g. VC, TSP), look at ratio: (size of solution we find)/(optimal size). For maximization problems (like clique, independent set), we'll look at (optimal size)/(size of solution we find). In general, alg achieves approximation ratio alpha if for all instances of the problem, it finds a solution with ratio <= alpha. Interesting thing: even though all NP-complete problems are equivalent in terms of difficulty of finding optimal solution, the hardness or easyness of getting a good approximation varies all over the map. VERTEX COVER contd ================== Here's another 2-approximation algorithm for Vertex Cover: Alg2: Step1: Solve a *fractional* version of the problem. Have a variable x_i for each vertex. Constraint 0<= x_i <= 1. Each edge should be "fractionally covered": for each edge (i,j) we want x_i+x_j >= 1. Then our goal is to minimize sum_i x_i. We can solve this using linear programming. E.g., triangle-graph. Step2: now pick each vertex i such that x_i >= 1/2. Claim 1: this is a VC. Why? Claim 2: The size of this VC is at most twice the size of the optimal VC. Why? Because it's at most twice the value of the *fractional* solution we found. And, the size of the optimal fractional solution is <= the size of the optimal integral solution (i.e., the optimal VC). Open problem: Can you get an approximation ratio of 1.99 for Vertex Cover? Best known is 2 - O((loglog n)/(log n)), which is 2 - o(1). Current best hardness result: Hastad shows 7/6 - epsilon is NP-hard. Rent-or-buy? ============ Simple online problem called the rent-or-buy problem, aka the tuxedo-rental problem. Say you get invited to some formal event and need to wear a tuxedo. Can either rent for $r (e.g., $50) or purchase for $p (e.g., $300) where p > r. (Say p is a multiple of r for convenience later). So, maybe you decide to rent. Then get invited to one again. Pretty soon, if you get invited to more than 6, you're wishing you had bought right at the start. Optimal strategy is: if you know you will be invited to more than 6 before the styles or your waistline changes then yout should buy. Otherwise you should rent. But, what if you don't know? Look at Competitive Ratio measure. Competitive ratio is worst case (maximum) over sequences of events of the ratio: (our cost)/OPT, where OPT = optimal cost in hindsight. "cost" means total cost over all time. What is CR of algorithm that says "buy right away"? If only go once, you pay $p, but OPT was $r. ratio is p/r. (e.g, 300/50 = 6) What about algorithm that says "Rent forever"? Ratio is infinite. Optimal deterministic algorithm: "Rent until one more rental would put you at the purchase cost, then buy." (e.g., here it is "rent 5 times then buy") If end up never buying, then you were optimal. If you buy, you spent p + (p-r) = 2p-r. (e.g., 300 + 250), but optimal was p. Ratio is 2 - r/p. Can't beat ratio of (2 - r/p) with a deterministic alg. Actually, can do better with randomization depending on how you set things up. Need to assume that number of times you will need tux is predetermined before you flip your coins. I.e., competitive ratio is max over sequence of events s, of E[Algcost(s)] / OPT(s). More CS-like situations where this comes up: laptop: when to stop disk from spinning between data accesses. To optimize or not to optimize. etc. Pursuer/evader game [if have time] =================== You're a mouse in one of n hiding places. At each time step, cat comes to one of the places. If it's the one you're hiding in, you need to move. Say your cost is the number of times you've moved. OPT is the fewest number of times you could have moved in hindsight. If want to think of paging: think of world with n pages total with a cahce of size n-1. "you" are the page that is NOT in the cache. Each probe by cat is a page request. Cat finding you = page fault. Let's look at minimizing competitive ratio. Note: this is not a fun game for deterministic algorithms. For any deterministic algorithm, there exists a bad sequence of requests such that Algcost(s) >= n * OPTcost(s). How about randomized algorithms. Claim: there is a randomized alg such that for any sequence s, E[Algcost(s)] <= O(log n) * OPTcost(s). Marking Algorithm (begin running this when receive first page fault) * start at random place * when place is probed, mark it. * when your spot gets probed, move to random unmarked spot * if you're at last unmarked spot, clear marks and restart. This is like a 1-bit randomized version of LRU. Claim 1: Randomized competitive ratio for Marking is O(log n) Proof: Initially our prob dist is 1/n at each point. When cat probes first point, we have 1/n prob of being hit. After this probe, our distrib is 1/(n-1) at each unmarked point. After next probe to unmarked point we have 1/(n-1) prob of being hit and our distrib is 1/(n-2) at each unmarked. And so on. So, total expected number of moves per phase is 1/n + 1/(n-1) + ... + 1/1 = H_n = O(log n). But, OPT in hindsight moves at least once per phase since every point got hit. So, ratio is O(log n). (Note: Algorithm that is NOT so good: Whenever we are found out, move to a random place.) Claim 2: no algorithm can get o(log n) Proof: What if cat probes randomly. Then, no matter what mouse does, mouse has 1/n prob of being forced to move. So, in t time steps, expected cost to mouse is t/n. But, how long does it take cat to hit every point? This is "coupon collector's problem". Expected time to hit all is n/n+ n/(n-1) + ... + n/1 = n*log(n). So, in hindsight, OPT was to move to that last point and only pay once every n*log(n) probes. So, ratio is O(log n). Note: there's a game-theory like thing going on. We gave rand alg for mouse s.t. for all cat strategies, value of ratio = O(log n). Then gave rand alg for cat s.t. for all mouse strategies, value of ratio is Omega(log n).