15-750 Graduate Algorithms 3/1/01
- network flow
- Edmonds-Karp #1
Reading: Chapter 16&17
===========================================================================
NETWORK FLOW
------------
- directed graph. Source node. Sink node. Each (directed) edge has a
capacity. (Let's say these are integers.) Goal is to flow as much as
possible from source to sink.
E.g.,
s-->a, cap = 4. a-->c, cap=3. c-->t, cap = 2.
s-->b, cap = 2. b-->d, cap=3. d-->t, cap = 4.
c-->b, cap = 1. b-->c, cap=2.
E.g., want to route message traffic from source to sink, and the
capacities tell me how much bandwidth we're allowed on each edge.
Formally, rules are:
Capacity constraint: on any edge, f(u,v) <= c(u,v)
Flow conservation: for any vertex except s and t, flow in = flow out.
What is the max flow here? Answer: 5.
How can you see that above max flow was really maximum? Notice, this
flow saturates the a-->c and s-->b edges. And, if you remove these,
you disconnect s from t. In other words, the graph has a "s-t cut" of
size 5. The point is that any unit of stuff going from s to t must
take up at least 1 unit of capacity in these pipes. So, we know we're
optimal.
We just argued that in general, max flow <= min cut.
Definition: an s-t cut is a set of edges whose removal disconnects s
from t. Or, formally, a cut is a partition of vertex set into A and B
where s is in A and t is in B. (The edges of the cut are then all
edges going from A to B).
Definition: capacity of cut is the sum of capacities of edges in the
cut. Or, in the formal viewpoint, it is the sum of capacities of all
edges going from A to B.
Easy fact we showed above: max s-t flow <= min s-t cut.
How to find a maximum flow and prove it's correct? Here's a very
natural strategy: find a path from s to t and push as much flow on it
as possible. Then look at the leftover graph and repeat until there
is no longer any path with capacity left to push any more flow on.
This is called the Ford-Fulkerson algorithm.
Basic Ford-Fulkerson algorithm
------------------------------
While there exists an s-->t path P of positive residual (leftover) capacity:
push maximum possible flow along P (saturating at least one edge on it)
[These are called "augmenting paths"]
Definition: "Residual capacity" c_f(u,v) = c(u,v) - f(u,v)
where we make the convention that f(v,u) = -f(u,v).
--> Example on above graph: start with path s-->b-->c-->t. Use
"[capacity/flow]" notation.
Note: residual capacity can be *larger* than original capacity if we
had flow going in opposite direction. E.g., residual capacity on the
c-->b edge is now 3.
To keep track of what's going on, it helps to write the "RESIDUAL
GRAPH". This graph shows the residual capacities: it has all edges of
positive residual capacity, each one labeled by its residual capacity.
Other good reason to write down residual graph? If we have a black
box path-finding routine, this is what we want to feed into it at each
iteration of the Ford-Fulkerson algorithm.
--> Note: running time is polynomial in the number of nodes and the
value of the flow. But if we allow large capacities written in
binary, this is not necessarily polynomial time in the description
length of the input. E.g., standard bad-choice graph. After we're
done proving the algorithm is correct, we'll look at ways for fixing
that problem.
Algorithm by design finds a legal flow. Why is it maximum?
Proof: Let's look at the final residual graph. This graph must have s
and t disconnected by definition of the algorithm. Let A be
component containing s and B = rest. Now, let C = capacity of the (A,B)
cut in the *original* graph --- so we know we can't do better than C.
The claim is that we in fact *did* find a flow of value C (which
therefore implies it is maximum). Here's why: let's look at what
happens to the residual capacity of the cut after each iteration of
the algorithm. Say in some iteration we found a path with k units of
flow. Then, even if the path zig-zagged between A and B, every time
we went A-->B we added k to flow from A to B (and subtracted k from
the residual capacity) and every time we went B-->A we took away k
from this flow (and added k to the residual capacity). And, we went
from A-->B exactly one more time than we went from B-->A. So, the
residual capacity went down by exactly k. So, the drop in capacity is
equal to the increase in flow. Since at the end the residual capacity
is zero (remember how we defined A and B) this means the total flow is
equal to C.
Note, we've actually proven the nonobvious MAXFLOW-MINCUT theorem.
In any graph, max flow from s to t = capacity of minimum (s,t)-cut.
We started with saying max flow <= min cut. But now we've argued that
this algorithm finds a flow of value equal to SOME cut, so it can't be
less than the MINIMUM.
(This is actually very similar to min/max theorem for 2-player zero
sum games.)
We have also proven the INTEGRAL-FLOW THEOREM: if all capacities are
integers, then there is a max flow in which all flows are integers.
Bipartite Maximum matching problem
----------------------------------
One nice use of network flow is to solve the bipartite matching
problem. E.g., assigning dorm rooms to freshmen (people on the left,
slots on the right, edge if dorm is acceptable to student). Or assigning
factories to stores (stores on the left, each factory is represented
by several nodes on the right depending on how many stores it can
service. Edge if factory is close to the store.)
A MATCHING is a set of edges with no endpoints in common. What we want
here is a PERFECT MATCHING - a matching that connects every point on
the left hand side with some point on the right hand side. More
generally (say there is no perfect matching) we want a MAXIMUM
MATCHING - a matching with maximum possible number of edges.
Algorithm to solve:
------------------
1. Set up fake "start" node S connected to all in L. Connect all in R to
a fake "end" node T. Orient all edges left-to-right and give each a
capacity of 1.
2. Find a max flow from S to T using Ford-Fulkerson.
3. Output matching corresponding to flow.
=====================================================================
EDMONDS-KARP #1
---------------
One problem with Ford-Fulkerson is that running time can be large if
capacities are large. Number of iterations is O(F), where F is the
max flow. This might not be polynomial in the description length of
the problem instance.
Edmonds-Karp give two natural heuristics that solve the problem.
Edmonds-Karp #1: Always pick the maximum bottleneck augmenting
path. (The path of maximum minimum capacity)
Claim: This causes FF to make at most O(m log(F)) iterations.
First: how can we *find* the maximum bottleneck path in a graph?
Answer: do a greedy algorithm like Prim. Starting from U = {s}, always
put in the widest (highest capacity) edge leaving the set U. How to
see that it works: at any point in time, say we have a set U, and the
widest edge leaving U has capacity c. Then, *any* path from s to a
node in V-U has bottleneck at least c. So, putting in the edge of
capacity c is safe.
Proof of claim: If the current residual graph has max flow f, we will
show that the maximum bottleneck path has capacity at least f/m. Why
is this? Let's try running the above algorithm: if we ever get to a
point where all edges leaving U have capacity < f/m, this means we
have a cut of < f, which is a contradiction.
So, after every iteration, the max flow in the residual graph is
multiplied by at most (1 - 1/m). So, after m iterations, the flow has
gone down by factor of 1/e, and after m*log(F) iterations, it is < 1.
=====================================================================