15-854 Approximation and Online Algorithms 01/19/00
http://www.cs.cmu.edu/~avrim/Approx00/
* Overview Handouts:
* Admin - Course info
* Approx example: Vertex Cover - these notes
* Online examples: rent or buy, pursuer-evader
==========================================================================
Overview:
- What are approximation algorithms, what are online algorithms,
and what do they have in common?
- Approx: have a problem that is too hard computationally to solve
exactly. Can we efficiently (e.g., in polynomial time)
guarantee a good approximation. E.g., Vertex cover <= 2 *
optimal. Or, MAX-SAT: get >= 3/4 * optimal.
- Online: For problems where only see inputs one at a time, can we
get close to best possible in hindsight?
E.g., # page faults <= log(n) * optimal.
- Both concerned with how close can we get to an ideal optimum, for
problems where (for different reasons) getting the optimum may
not be possible.
- Historically, approx algs are older. Online algs came more
recently out of study of adaptive data structures and
amortized analysis.
- Not so much overlap historically in terms of techniques, but this
is changing. E.g., Bartal's metric space approx - developed in online
setting but useful also for approx algs.
Admin:
- See course info sheet.
VERTEX COVER
============
- What is the fewest # of guards we need to place in museum to
cover all the corridors.
- Formally, given a graph G, looking for smallest set of vertices
such that every edge is incident to at least one vertex in the set.
- Example: +----+----+
| | |
+----+----+
- This problem is NP-hard. But it turns out it's easy to get
within a factor of 2.
Alg1:
Pick arbitrary edge. We know any VC must have at least 1
endpt, so lets take both. Then throw out all edges covered
and repeat. Equivalently: find a maximal matching (doesn't
have to be *maximum*, just *maximal*), then for each edge, put
both endpoints into our cover. Note that this is a VC
(otherwise matching wouldn't be maximal). Also any VC must
have at least one endpoint of each of these edges. Therefore,
we're off by at most a factor of 2.
This is a little tricky: why take both endpoints? How about if we just
take one? This seems less wasteful. Equivalently: pick
arbitrary vertex, put into VC, delete edges covered, and repeat. Does
this guarantee a factor of 2? What's a bad example?
So, this is a 2-approximation.
GENERAL NOTATION
================
In general, for minimization problems (e.g. VC, TSP), look at ratio:
(size of solution we find)/(optimal size).
For maximization problems (like clique, independent set), we'll look at
(optimal size)/(size of solution we find).
In general, alg achieves approximation ratio alpha if for all instances
of the problem, it finds a solution with ratio <= alpha. Interesting
thing: even though all NP-complete problems are equivalent in terms of
difficulty of finding optimal solution, the hardness or easyness of
getting a good approximation varies all over the map.
VERTEX COVER contd
==================
Here's another 2-approximation algorithm for Vertex Cover:
Alg2:
Step1: Solve a *fractional* version of the problem. Have a variable
x_i for each vertex. Constraint 0<= x_i <= 1. Each edge should be
"fractionally covered": for each edge (i,j) we want x_i+x_j >= 1.
Then our goal is to minimize sum_i x_i. We can solve this using
linear programming.
E.g., triangle-graph.
Step2: now pick each vertex i such that x_i >= 1/2.
Claim 1: this is a VC. Why?
Claim 2: The size of this VC is at most twice the size of the optimal
VC. Why? Because it's at most twice the value of the *fractional*
solution we found. And, the size of the optimal fractional solution
is <= the size of the optimal integral solution (i.e., the
optimal VC).
Open problem: Can you get an approximation ratio of 1.99 for Vertex
Cover? Best known is 2 - O((loglog n)/(log n)), which is 2 - o(1).
Current best hardness result: Hastad shows 7/6 - epsilon is NP-hard.
Rent-or-buy?
============
Simple online problem called the rent-or-buy problem, aka the
tuxedo-rental problem. Say you get invited to some formal event and
need to wear a tuxedo. Can either rent for $r (e.g., $50) or purchase
for $p (e.g., $300) where p > r. (Say p is a multiple of r for
convenience later). So, maybe you decide to rent. Then get invited
to one again. Pretty soon, if you get invited to more than 6, you're
wishing you had bought right at the start. Optimal strategy is: if
you know you will be invited to more than 6 before the styles or your
waistline changes then yout should buy. Otherwise you should rent.
But, what if you don't know?
Look at Competitive Ratio measure.
Competitive ratio is worst case (maximum) over sequences of events of
the ratio: (our cost)/OPT, where OPT = optimal cost in hindsight.
"cost" means total cost over all time.
What is CR of algorithm that says "buy right away"? If only go
once, you pay $p, but OPT was $r. ratio is p/r. (e.g, 300/50 = 6)
What about algorithm that says "Rent forever"? Ratio is infinite.
Optimal deterministic algorithm:
"Rent until one more rental would put you at the purchase
cost, then buy." (e.g., here it is "rent 5 times then buy")
If end up never buying, then you were optimal. If you buy, you
spent p + (p-r) = 2p-r. (e.g., 300 + 250), but optimal was p.
Ratio is 2 - r/p.
Can't beat ratio of (2 - r/p) with a deterministic alg.
Actually, can do better with randomization depending on how you
set things up. Need to assume that number of times you will need tux
is predetermined before you flip your coins. I.e., competitive ratio
is max over sequence of events s, of E[Algcost(s)] / OPT(s).
More CS-like situations where this comes up:
laptop: when to stop disk from spinning between data accesses.
To optimize or not to optimize.
etc.
Pursuer/evader game [if have time]
===================
You're a mouse in one of n hiding places. At each time step, cat
comes to one of the places. If it's the one you're hiding in, you
need to move. Say your cost is the number of times you've moved. OPT
is the fewest number of times you could have moved in hindsight.
If want to think of paging: think of world with n pages total with a
cahce of size n-1. "you" are the page that is NOT in the cache. Each
probe by cat is a page request. Cat finding you = page fault.
Let's look at minimizing competitive ratio. Note: this is not a fun
game for deterministic algorithms. For any deterministic algorithm,
there exists a bad sequence of requests such that
Algcost(s) >= n * OPTcost(s).
How about randomized algorithms. Claim: there is a randomized alg
such that for any sequence s, E[Algcost(s)] <= O(log n) * OPTcost(s).
Marking Algorithm (begin running this when receive first page fault)
* start at random place
* when place is probed, mark it.
* when your spot gets probed, move to random unmarked spot
* if you're at last unmarked spot, clear marks and restart.
This is like a 1-bit randomized version of LRU.
Claim 1: Randomized competitive ratio for Marking is O(log n)
Proof: Initially our prob dist is 1/n at each point. When cat probes
first point, we have 1/n prob of being hit. After this probe, our
distrib is 1/(n-1) at each unmarked point. After next probe to
unmarked point we have 1/(n-1) prob of being hit and our distrib is
1/(n-2) at each unmarked. And so on. So, total expected number of
moves per phase is 1/n + 1/(n-1) + ... + 1/1 = H_n = O(log n).
But, OPT in hindsight moves at least once per phase since every point
got hit. So, ratio is O(log n).
(Note: Algorithm that is NOT so good: Whenever we are found out,
move to a random place.)
Claim 2: no algorithm can get o(log n)
Proof: What if cat probes randomly. Then, no matter what mouse does,
mouse has 1/n prob of being forced to move. So, in t time steps,
expected cost to mouse is t/n. But, how long does it take cat to hit
every point? This is "coupon collector's problem". Expected time to
hit all is n/n+ n/(n-1) + ... + n/1 = n*log(n). So, in hindsight, OPT
was to move to that last point and only pay once every n*log(n)
probes. So, ratio is O(log n).
Note: there's a game-theory like thing going on. We gave rand alg for
mouse s.t. for all cat strategies, value of ratio = O(log n). Then
gave rand alg for cat s.t. for all mouse strategies, value of ratio is
Omega(log n).