15-451 Algorithms 10/26/04
Network Flow II
- bipartite matching
- Edmonds-Karp #2
- Dinic & MPM
- min-cost max flow
======================================================================
NETWORK FLOW RECAP: Given a directed graph G, a source s, and a sink
t. Each edge (u,v) has some capacity c(u,v). Goal is to find the
maximum flow possible from s to t.
Last time: looked at Ford-Fulkerson algorithm. Used to prove
maxflow-mincut theorem, also integral flow theorem.
Key idea: Given a flow f so far, we describe the capacities left over
in a "residual graph". Specifically, the residual graph has all edges
of positive "residual capacity", defined as:
c_f(u,v) = c(u,v) - f(u,v)
where we define
f(v,u) = -f(u,v)
FF alg is: find path of positive residual capacity; push flow to
saturate; repeat. For example: 4 nodes {s,a,b,t}. s-->a has cap 4,
a-->t has cap 2, s-->b of cap 2, b-->t of cap 3, and a-->b of cap 3.
Point of this is that any legal flow in the residual graph, when added
to f (edge by edge) is a legal flow in the original graph (satisfies
capacity constraints; obeys flow in = flow out from each node). And,
this is true vice-versa. This means that the maximum flow in the
residual graph, plus f, is a maximum flow in the original graph.
APPLICATION: MAXIMUM MATCHING
=============================
Say we are in charge of assigning dorm rooms to incoming freshmen.
Let's say freshmen are allowed to scout out rooms and list as
acceptable or unacceptable. Eg., small school with 3 freshmen and 3
available rooms:
student slot
------- ----
Dan A
Sue B
Pat C
Edge if acceptable: e.g, (Dan A), (Dan B), (Sue A), (Sue C), (Pat B), (Pat C)
This is a BIPARTITE GRAPH: a graph with two sides L and R, and all
edges go between L and R.
A MATCHING is a set of edges with no endpoints in common. What we want
here is a PERFECT MATCHING - a matching that connects every point on
the left hand side with some point on the right hand side. What's a
perfect matching here? More generally (say there is no perfect
matching) we want a MAXIMUM MATCHING - a matching with maximum
possible number of edges.
Algorithm to solve:
------------------
1. Set up fake "start" node S connected to all in L. Connect all in R to
a fake "end" node T. Orient all edges left-to-right and give each a
capacity of 1.
2. Find a max flow from S to T using Ford-Fulkerson.
3. Output matching corresponding to flow.
This finds a legal matching because edges from R to T have capacity 1,
so the matching can't have two edges into same node, and similarly the
edges from S to L have cap=1, so you can't have two edges leaving same
node in L. It's maximum because any matching gives you a flow of the
same value. (So if there was a better matching, we wouldn't be at a
maximum flow).
What about the number of iterations of path-finding? This is at most
the #edges in the matching since each new augmenting path gives us 1
new edge.
[Run alg on above example]. Notice a neat fact: say you start
matching Dan->A, Pat->C. These are bad choices. But, the augmenting
path automatically undoes them as it improves the flow!
Matchings come up as subroutines in a lot of problems. Another
example: matching up suppliers to customers. Put customers on RHS.
For each supplier, put k nodes on LHS if it can supply k customers.
Then link up based on which customers are near to which suppliers
(some may be near to multiple suppliers, which is what makes it
interesting).
IMPROVING THE RUNNING TIME OF NETWORK FLOW
==========================================
Assuming capacities are integers, the basic Ford-Fulkerson algorithm
could make up to F iterations, where F is the value of the max flow.
Each iteration takes O(m) time to find a path (e.g., DFS or BFS)
and to compute the residual graph. [m = # edges]. So, total time is
O(mF).
This is fine if F is small, like in the matching case. Not good if
capacities are in binary and F could be very large.
Edmonds-Karp #1: push flow along widest path. We saw that this
ensures at most O(m*log(F)) iterations. Each iteration involves
running a Prim-like algorithm. Takes time O(m log n). Total time
is O(m^2 * log(F) * log(n)). [Can get rid of log(n) by being clever]
Can we remove dependence on F completely?
EDMONDS-KARP #2
---------------
Edmonds-Karp #2: always pick the *shortest* path (rather than the
max-capacity path)
Claim: this makes at most mn iterations. So, run time is O(nm^2)
since we use BFS in each iteration. Proof is pretty neat.
Proof: Let d(t) be the distance from s to t in the current residual
graph. We'll prove the theorem by showing that every m iterations,
d(t) has to go up by at least 1.
Let's lay out the graph in levels according to a BFS from s. I.e.,
nodes at level i are distance i away from s, and t is at level d(t).
Now, notice that so long as d(t) doesn't change, the paths found WILL
ONLY USE FORWARD EDGES IN THIS LAYOUT. Each iteration saturates (and
removes) at least 1 forward edge, and adds only backward edges (so no
distance ever drops). This can happen at most m - d(t) < m times
since after that many time, t would become disconnected from s. So,
after m iterations, either t is disconnected (and d(t) = infinity) or
else we must have used a non-forward edge, implying that d(t) has gone
up by 1.
Since d(t) can increase at most n times, there are at most nm iterations.
DINIC & MPM
===========
Can we do this a little faster? (this is not in the book, and it's not
so crucial you get the details, but it's pretty nice I think.)
Previous alg: mn iterations at O(m) each for O(m^2 n) total. Let's
see if we can reduce to O(n^3). Idea: Given the current BFS layout
(also called the "level graph"), we'll try in O(n^2) time all at once
to find the max flow that only uses forward edges. This is sometimes
called a "blocking flow". Guarantees that when we take the residual
graph, d(t) has gone up by 1. So, at most n iterations.
Define c(v) = capacity of vertex v to be: min[c_in(v), c_out(v)], where
c_in(v) is sum of capacities of in-edges and c_out is sum of
capacities of out-edges.
The Algorithm:
- In O(m) time, create BFS layout from s, and intersect with backwards
BFS from t. Compute capacities of all nodes.
- Find vertex v of minimum capacity c. If it's zero, that's great: we
can remove v and incident edges, updating capacities of neighboring nodes.
- if c is not zero, then we greedily pull c units of flow from s to v,
and then push it along to t. This seems like just another version of
the original problem. But, there are two points here:
(1) because v had *minimum* capacity, we can do the pulling and
pushing in any greedy way and we won't get stuck. E.g., we can do
BFS forward from v and backward from v, pushing the flow
level-by-level, so we never examine any edge twice. [do example]
This right away gives us an O(mn) algorithm for saturating the level
graph (O(m) per node, n nodes), for an overall running time of
O(mn^2). So, we're half-way there.
(2) To get O(n^2), the way we will do the BFS is to examine the
in-edges (or out-edges) one at a time, fully saturating the edge
before going on to the next one. This means we can allocate our
time into two parts: (a) time spent pushing through edges that get
saturated, and (b) time spent on edges that we didn't quite saturate
[at most one of these per node]. We only take time O(n) on type-b
operations. Type-a operations result in deleting the edge, so the
edge won't be used again in the current iteration. So over *all*
vertices, the *total* time of these is just O(m).
So, our running time per iteration is O(n^2 + m) = O(m). [O(n^2) for
the type-b and O(m) for the type-a] Once we've saturated the
level graph, we recompute it (in O(m) time, but this can be put into
the O(n^2)) and re-solve. Total time is O(n^3).
MIN-COST MATCHINGS, MIN-COST MAX FLOWS
--------------------------------------
We talked about the problem of assigning students to dorm rooms where
each student had a list of acceptable versus unacceptable rooms. A
natural generalization is to ask: what about preferences? E.g,
maybe Dan prefers room A so it costs $0 to match him there, his second
choice is room B so it costs us $1 to match him here, and he hates
room C so it costs $infinity to match him there. And, so on with Sue
and Pat. Then we could ask for the MINIMUM-COST PERFECT MATCHING.
This is a perfect matching that out of all perfect matchings has the
least total cost.
Generalization to flows is called the MIN-COST MAX FLOW problem.
Here, each edge has a cost w(e) as well as a capacity c(e). Cost of a
flow is the sum over all edges of the flow on that edge times the cost
of the edge. I.e.,
cost of flow f = SUM_e w(e)*f(e).
Goal is to find, out of all possible maximum flows, the one with the
least total cost. Can have negative costs (or benefits) on edges too.
- These are more general than plain max flow so can model more things.
There are a couple ways to solve this. One way is to add an edge
from t to s of infinite capacity and large negative cost, and to
convert this into the "MIN-COST CIRCULATION" problem. Basically, by
doing this, we now can just have the goal of finding a circulation (a
flow that obeys flow-in = flow-out at EVERY node (no s and t anymore))
that has smallest cost (this will be a negative number). This will
be the min-cost max-flow in the original graph.
Algorithm: find a negative cost cycle and saturate it.
Can use Bellman-Ford to find negative cost cycles.
Running time will be like Ford-Fulkerson. To speed up, ideally would
like the "most-negative-cost cycle" but that's NP-hard.
Instead there are various tricks you can use to still do well. E.g.,
in the Goldberg-Tarjan algorithm, you instead find the cycle that
minimizes (cost of cycle)/(# edges in cycle), which is something you
*can* solve. [add some value k to all edges and check if there's
still a negative-cost cycle using BF. Do binary search to find the
value of k s.t. the min-cost cycle has value 0. That's the one that
minimizes (cost of cycle)/(# edges in cycle) in the original graph.]
Then you prove this has a bound roughly like Edmonds-Karp #1.