Sometimes, the best way to approach a problem is to greedily take the best-looking item until the goal is met.
We examine
Problem class MinimumSpanningTree
application: find the cheapest way to connect offices with a fiber optic network.
500
o --- o
| /|
200 | 100 | 2000
| / |
| / |
o --- o
5000
Algorithm Kruskal
E' <- empty set
while E' does not connect all vertices do
e <- next lightest edge
if e connects vertices E' does not connect then
E' <- E' union { e }
fi
od
return E'
example:
500
o --- o
| /|
200 | 100 | 2000
| / |
| / |
o --- o
5000
We look at the 100-weight edge first. It connects two unconnected
vertices, so we include it.
o o
/
100
/
/
o o
Look at the 200-weight edge. It connects two unconnected
vertices, so we include it.
o o
| /
200 | /
| /
| /
o o
Now try the 500-weight edge. Its endpoints are already connected;
ignore it.
We look at the 2000-weight edge, and include it.
o o
| /|
| / | 2000
| / |
| / |
o o
Say Kruskal returns E'. Take any set F connecting all vertices. We improve it to make it more like E'. This will imply that E' is the optimal solution.
Let e be an edge in F but not in E'. Remove it. If this disconnects nothing, we have improved the set and we are done. Otherwise we have
/----\
|A |
| o |
\--|-/
|e
/--|-\
| o |
|B |
\----/
Let e' be the cheapest edge between A and B. E' includes e', so e is
not e', so F is better with e replaced by e'.
Problem class SetCover:
application: build as few fire hydrants as possible so every house is next to one.
*---o---o---*---o
\_ \_ | _/| _/
\ \|/ |/
o---o---o
\_____/
* represents is the optimal hydrant placement
We take S to be the vertices (houses) of the graph; for each vertex v,
we place into C the set of v and its adjacent vertices.
fact: All exact algorithms take exponential time (unless P != NP conjecture is false).
Algorithm Greedy-Set-Cover
C' <- empty set
U <- S
while U != empty set do
A <- subset from C covering the most of U
C' <- C' union { A }
U <- U minus A
od
return C'
example:
*---o---o---o---*
\_ \_ | _/| _/
\ \|/ |/
o---*---o
\_____/
* represents the hydrant placement chosen by the algorithm
Note that this is not optimal! But it is close.
Theorem: Let n = |S| and k be the size of the optimal solution. Greedy-Set-Cover returns at most k ln n + 1 subsets.
First set added has >= n / k items, leaving n (1 - 1/k) items in U. U is still covered by the optimal set, so the next set added to C' has >= |U|/k items, leaving <= (1 - 1/k) |U| <= n (1 - 1/k)^2 items in U. Generally, after the ith iteration, U has <= n (1 - 1/k)^i items.
When does this reach 1? Note that
(1 - 1/k)^k <= 1/e.We use this to simplify our expression.
n (1 - 1/k)^i = n ((1 - 1/k)^k)^(i / k) <= n (1/e)^(i/k)Now we work backwards to see how large i must be for this to be at most 1.
n (1/e)^(i/k) <= 1
n <= e^(i/k)
ln n <= i/k
i >= k ln n
So after at most k ln n iterations there will be at most 1 item left.
Picking up this item will take one more iteration. So C' has at most
k ln n + 1items in the set.