15-750 Graduate Algorithms		3/20/01
* Approximation Algorithms

======================================================================
General Question: For problems that are NP-hard, can we perhaps in
polynomial time guarantee a good *approximate* solution?  E.g., can we
guarantee to find a solution that's within 10% of optimal?  Or how
about within a factor of 2 of optimal?  Or, anything non-trivial?

Interesting thing: even though all NP-complete problems are equivalent
in terms of difficulty of finding optimal solution, the hardness or
easyness of getting a good approximation varies all over the map.

VERTEX COVER
============
   - Given a graph G, looking for smallest set of vertices such that
	every edge is incident to at least one vertex in the set. 

   - Example:                            +----+----+
                                         |    |    |
                                         +----+----+

   - Can think of as: what is the fewest # of guards we need to place
	in museum to cover all the corridors.

   - This problem is NP-hard.  But it turns out it's easy to get
        within a factor of 2.

Let's start with some strategies that *don't* work.

  Strawman #1: Pick arbitrary vertex with at least one uncovered edge
	incident, put into cover, and repeat.  What would be a bad example?
	[A: how about a star graph]

  Strawman #2: how about picking the vertex that covers the *most*
	uncovered edges.  Turns out this doesn't work either. [see hwk]

How to get factor of 2.  Two easy algorithms:

 Alg1:
        Pick arbitrary edge.  We know any VC must have at least 1
        endpt, so lets take both.  Then throw out all edges covered
        and repeat.   Keep going until no uncovered edges left.  What
	we've found in the end is a matching (a set of edges no two of
	which share an endpoint) that's "maximal" (meaning that you can't add
	any more edges to it and keep it a matching).  So, if we picked k
	edges, our VC has size 2k, but any VC must have size at least k since
	you need to have at least one endpoint of each of these edges.
	[notice that we do betterif we find a smaller maximal matching]

Here's another 2-approximation algorithm for Vertex Cover:

 Alg2: 
        Step1: Solve a *fractional* version of the problem.  Have a variable
        x_i for each vertex.  Constraint 0<= x_i <= 1.  Each edge should be
        "fractionally covered": for each edge (i,j) we want x_i+x_j >= 1.
        Then our goal is to minimize sum_i x_i.  We can solve this using
        linear programming.

        E.g., triangle-graph.

        Step2: now pick each vertex i such that x_i >= 1/2.

   Claim 1: this is a VC.  Why?

   Claim 2: The size of this VC is at most twice the size of the optimal
        VC.  Why?  Because it's at most twice the value of the *fractional*
        solution we found.  And, the size of the optimal fractional solution
        is <= the size of the optimal integral solution (i.e., the
        optimal VC).

   Interesting fact: nobody knows any algorithm with approximation
   ratio 1.9.  Best known is 2 - O((loglog n)/(log n)), which is 2 - o(1).

   Current best hardness result: Hastad shows 7/6 - epsilon is NP-hard.


SET-COVER
---------
	Given a domain X of n points, and m subsets S_1, S_2, ..., S_m of
	these points.  Goal: find the fewest number of these subsets needed to
	cover all the points.

Set-cover is NP-hard.  It's a generalization of vertex cover (can
anyone see how?).  In fact, it's been proven that getting an o(log n)
approximation to the optimal solution is also NP-hard.  Luckily,
getting O(log n) is not so bad: a simple greedy algorithm works.

	Greedy Algorithm: Pick the set that covers the most points.  Throw out
	all the points covered.  Repeat.
	
	What's an example where this *doesn't* find the best solution?
	
	Theorem: If the optimal solution uses k sets, the greedy algorithm
	finds a solution with at most k(ln(n/k) + 1) sets. So, it is an 
	O(log n) approximation -- or better if k is large.
	
Proof: Since the optimal solution uses k sets, there must some set
	that covers at least a 1/k fraction of the points. Therefore, after
	the first iteration of the algorithm, there are at most n(1 - 1/k)
	points left.  After the second, there are at most n(1 - 1/k)^2 points
	left, etc.  After k rounds, there are at most n(1 - 1/k)^k < n*(1/e)
	points left.  After k*ln(n/k) rounds there are at most k points left.
	So there are at most k iterations to go before we're done.

Here's another algorithm: first solve the fractional version --- have
	variable x_i for each set S_i which is allowed to be in range
	[0,1]. For each point p, have constraint that sum of x_i's such that
	S_i contains p must be at least 1.  Then solve.  Get a solution whose
	total sum is some value k' (value of the optimal fractional solution)
	which is <= k (value of the optimal integral solution).
	Unfortunately, unlike vertex-cover, we can't just take all values > x,
	since some point might be covered by lots of small fractions that
	happen to sum to 1.  But, one thing we *can* do is treat fractions as
	probabilities -- this is called randomized rounding.

There are a couple ways to do randomized rounding.  Here's the one I
	think is cleanest.  First, divide all the numbers by k' so
	that they now all sum to 1.  Now, view this as a probability
	distribution and make independent draws from it.   

Let's fix some point p.  Sum of values on sets covering it is at
	least 1/k'.  So, after k' draws, the probability it *hasn't* been
	covered is at most (1 - 1/k')^k' <= 1/e.  So, expected number of
	points still uncovered is <= n/e.  After k'*ln(n/k') draws, 
	Pr(p is uncovered) <= k'/n, so expected number of points uncovered is
	<= k'.  Once we have <= k' points remaining, can just be greedy and
	cover the rest with <= k' more sets.  So, the algorithm is:

	Draw k'*ln(n/k') sets from the distribution.  If > ceiling{k'} points
	uncovered, repeat.  Else, cover remaining greedily.
	[not hard to show that we don't expect to have to repeat too
	many times]

	==> Total number of sets is at most k'*ln(n/k') + ceiling{k'}.

	So, this bound is in some sense better than the greedy algorithm
	bound, since we replace k by k'.  

Aside: the maximum ratio (OPT integral)/(OPT fractional) is often
	called the "integrality gap".  We have shown here that the integrality
	gap is at most log(n) since we actually *find* an integral solution
	within log(n) of the optimal fractional.

In these problems so far, rounding an LP is just an alternative way to
get something we can get by a greedy-type algorithm.  For the problem
below, randomized rounding of LP is the best alg we know in terms of
apx guarantee.