Sometimes, the best way to approach a problem is to greedily take the best-looking item until the goal is met.
We examine
Problem class MinimumSpanningTree
application: find the cheapest way to connect offices with a fiber optic network.
500 o --- o | /| 200 | 100 | 2000 | / | | / | o --- o 5000
Algorithm Kruskal E' <- empty set while E' does not connect all vertices do e <- next lightest edge if e connects vertices E' does not connect then E' <- E' union { e } fi od return E'
example:
500 o --- o | /| 200 | 100 | 2000 | / | | / | o --- o 5000We look at the 100-weight edge first. It connects two unconnected vertices, so we include it.
o o / 100 / / o oLook at the 200-weight edge. It connects two unconnected vertices, so we include it.
o o | / 200 | / | / | / o oNow try the 500-weight edge. Its endpoints are already connected; ignore it. We look at the 2000-weight edge, and include it.
o o | /| | / | 2000 | / | | / | o o
Say Kruskal returns E'. Take any set F connecting all vertices. We improve it to make it more like E'. This will imply that E' is the optimal solution.
Let e be an edge in F but not in E'. Remove it. If this disconnects nothing, we have improved the set and we are done. Otherwise we have
/----\ |A | | o | \--|-/ |e /--|-\ | o | |B | \----/Let e' be the cheapest edge between A and B. E' includes e', so e is not e', so F is better with e replaced by e'.
Problem class SetCover:
application: build as few fire hydrants as possible so every house is next to one.
*---o---o---*---o \_ \_ | _/| _/ \ \|/ |/ o---o---o \_____/ * represents is the optimal hydrant placementWe take S to be the vertices (houses) of the graph; for each vertex v, we place into C the set of v and its adjacent vertices.
fact: All exact algorithms take exponential time (unless P != NP conjecture is false).
Algorithm Greedy-Set-Cover C' <- empty set U <- S while U != empty set do A <- subset from C covering the most of U C' <- C' union { A } U <- U minus A od return C'
example:
*---o---o---o---* \_ \_ | _/| _/ \ \|/ |/ o---*---o \_____/ * represents the hydrant placement chosen by the algorithmNote that this is not optimal! But it is close.
Theorem: Let n = |S| and k be the size of the optimal solution. Greedy-Set-Cover returns at most k ln n + 1 subsets.
First set added has >= n / k items, leaving n (1 - 1/k) items in U. U is still covered by the optimal set, so the next set added to C' has >= |U|/k items, leaving <= (1 - 1/k) |U| <= n (1 - 1/k)^2 items in U. Generally, after the ith iteration, U has <= n (1 - 1/k)^i items.
When does this reach 1? Note that
(1 - 1/k)^k <= 1/e.We use this to simplify our expression.
n (1 - 1/k)^i = n ((1 - 1/k)^k)^(i / k) <= n (1/e)^(i/k)Now we work backwards to see how large i must be for this to be at most 1.
n (1/e)^(i/k) <= 1 n <= e^(i/k) ln n <= i/k i >= k ln nSo after at most k ln n iterations there will be at most 1 item left. Picking up this item will take one more iteration. So C' has at most
k ln n + 1items in the set.