15-750 Graduate Algorithms 3/20/01 * Approximation Algorithms ====================================================================== General Question: For problems that are NP-hard, can we perhaps in polynomial time guarantee a good *approximate* solution? E.g., can we guarantee to find a solution that's within 10% of optimal? Or how about within a factor of 2 of optimal? Or, anything non-trivial? Interesting thing: even though all NP-complete problems are equivalent in terms of difficulty of finding optimal solution, the hardness or easyness of getting a good approximation varies all over the map. VERTEX COVER ============ - Given a graph G, looking for smallest set of vertices such that every edge is incident to at least one vertex in the set. - Example: +----+----+ | | | +----+----+ - Can think of as: what is the fewest # of guards we need to place in museum to cover all the corridors. - This problem is NP-hard. But it turns out it's easy to get within a factor of 2. Let's start with some strategies that *don't* work. Strawman #1: Pick arbitrary vertex with at least one uncovered edge incident, put into cover, and repeat. What would be a bad example? [A: how about a star graph] Strawman #2: how about picking the vertex that covers the *most* uncovered edges. Turns out this doesn't work either. [see hwk] How to get factor of 2. Two easy algorithms: Alg1: Pick arbitrary edge. We know any VC must have at least 1 endpt, so lets take both. Then throw out all edges covered and repeat. Keep going until no uncovered edges left. What we've found in the end is a matching (a set of edges no two of which share an endpoint) that's "maximal" (meaning that you can't add any more edges to it and keep it a matching). So, if we picked k edges, our VC has size 2k, but any VC must have size at least k since you need to have at least one endpoint of each of these edges. [notice that we do betterif we find a smaller maximal matching] Here's another 2-approximation algorithm for Vertex Cover: Alg2: Step1: Solve a *fractional* version of the problem. Have a variable x_i for each vertex. Constraint 0<= x_i <= 1. Each edge should be "fractionally covered": for each edge (i,j) we want x_i+x_j >= 1. Then our goal is to minimize sum_i x_i. We can solve this using linear programming. E.g., triangle-graph. Step2: now pick each vertex i such that x_i >= 1/2. Claim 1: this is a VC. Why? Claim 2: The size of this VC is at most twice the size of the optimal VC. Why? Because it's at most twice the value of the *fractional* solution we found. And, the size of the optimal fractional solution is <= the size of the optimal integral solution (i.e., the optimal VC). Interesting fact: nobody knows any algorithm with approximation ratio 1.9. Best known is 2 - O((loglog n)/(log n)), which is 2 - o(1). Current best hardness result: Hastad shows 7/6 - epsilon is NP-hard. SET-COVER --------- Given a domain X of n points, and m subsets S_1, S_2, ..., S_m of these points. Goal: find the fewest number of these subsets needed to cover all the points. Set-cover is NP-hard. It's a generalization of vertex cover (can anyone see how?). In fact, it's been proven that getting an o(log n) approximation to the optimal solution is also NP-hard. Luckily, getting O(log n) is not so bad: a simple greedy algorithm works. Greedy Algorithm: Pick the set that covers the most points. Throw out all the points covered. Repeat. What's an example where this *doesn't* find the best solution? Theorem: If the optimal solution uses k sets, the greedy algorithm finds a solution with at most k(ln(n/k) + 1) sets. So, it is an O(log n) approximation -- or better if k is large. Proof: Since the optimal solution uses k sets, there must some set that covers at least a 1/k fraction of the points. Therefore, after the first iteration of the algorithm, there are at most n(1 - 1/k) points left. After the second, there are at most n(1 - 1/k)^2 points left, etc. After k rounds, there are at most n(1 - 1/k)^k < n*(1/e) points left. After k*ln(n/k) rounds there are at most k points left. So there are at most k iterations to go before we're done. Here's another algorithm: first solve the fractional version --- have variable x_i for each set S_i which is allowed to be in range [0,1]. For each point p, have constraint that sum of x_i's such that S_i contains p must be at least 1. Then solve. Get a solution whose total sum is some value k' (value of the optimal fractional solution) which is <= k (value of the optimal integral solution). Unfortunately, unlike vertex-cover, we can't just take all values > x, since some point might be covered by lots of small fractions that happen to sum to 1. But, one thing we *can* do is treat fractions as probabilities -- this is called randomized rounding. There are a couple ways to do randomized rounding. Here's the one I think is cleanest. First, divide all the numbers by k' so that they now all sum to 1. Now, view this as a probability distribution and make independent draws from it. Let's fix some point p. Sum of values on sets covering it is at least 1/k'. So, after k' draws, the probability it *hasn't* been covered is at most (1 - 1/k')^k' <= 1/e. So, expected number of points still uncovered is <= n/e. After k'*ln(n/k') draws, Pr(p is uncovered) <= k'/n, so expected number of points uncovered is <= k'. Once we have <= k' points remaining, can just be greedy and cover the rest with <= k' more sets. So, the algorithm is: Draw k'*ln(n/k') sets from the distribution. If > ceiling{k'} points uncovered, repeat. Else, cover remaining greedily. [not hard to show that we don't expect to have to repeat too many times] ==> Total number of sets is at most k'*ln(n/k') + ceiling{k'}. So, this bound is in some sense better than the greedy algorithm bound, since we replace k by k'. Aside: the maximum ratio (OPT integral)/(OPT fractional) is often called the "integrality gap". We have shown here that the integrality gap is at most log(n) since we actually *find* an integral solution within log(n) of the optimal fractional. In these problems so far, rounding an LP is just an alternative way to get something we can get by a greedy-type algorithm. For the problem below, randomized rounding of LP is the best alg we know in terms of apx guarantee.