15-451 Algorithms 3/18/04
- Edmonds-Karp 2
- Dinic & MPM
- min-cost max flow
Notes from Spring 2002 GLM
===========================================================================
Last time we discussed max flow problem. Ford-Fulkerson algorithm, and
proof of maxflow-mincut theorem, also integral flow theorem.
Key idea: Given a flow f so far, we compute the residual graph that
describes the capacities left over. Specifically, the residual graph
has all edges of positive "residual capacity", defined as:
c_f(u,v) = c(u,v) - f(u,v)
where we define
f(v,u) = -f(u,v)
For example: 4 nodes {s,a,b,t}. s-->a has cap 4, a-->t has cap 3,
s-->b of cap 3, b-->t of cap 4, and a-->b of cap 2. Start with
initial flow of 2 along s,a,b,t.
Point of all this is that the maximum flow in the residual graph, plus
f, is a maximum flow in the original graph.
We ended with Edmonds-Karp version 1 (always augment along path of
maximum capacity). # iterations at most m*log(F), where F is the max
flow value (assuming all capacities are integers), m = number of
edges. Each iteration involves running a Prim-like algorithm.
EDMONDS-KARP #2
---------------
Edmonds-Karp #2: always pick the *shortest* path.
[Do example on standard cut-diamond graph, but where add extra nodes
to bottom-right and top-left edges to lengthen them.]
Claim: this makes at most mn iterations. So, run time is O(nm^2)
since we use BFS in each iteration.
Proof: Let d(t) be the distance froms to t in the current residual
graph. We'll prove the theorem by showing that every m iterations,
d(t) has to go up by at least 1.
Let's lay out the graph in levels according to a BFS from s. I.e.,
nodes at level i are distance i away from s, and t is at level d(t)..
Now, notice that so long as d(t) doesn't change, the paths found will
only use forward edges in this layout. Each iteration saturates (and
removes) at least 1 forward edge, and adds only backward edges (so
none of the distances ever drops). This can happen at most m times
before we're forced to use paths that contain backward edges, implying
that d(t) has gone up by 1 (actually, 2).
So, there are at most nm iterations.
DINIC & MPM
===========
(this is not in the book, and it's not so crucial you get the details,
but it's kind of neat I think.)
Can we do this a little faster?
Previous alg: mn iterations at O(m) each for O(m^2 n) total. Let's
see if we can reduce to O(n^3). Idea: Given the current BFS layout
(also called the "level graph"), we'll try in O(n^2) time all at once
to find the max flow that only uses forward edges. This is sometimes
called a "blocking flow". Guarantees that when we take the residual
graph, d(t) has gone up by 1 (actually, 2).
Define c(v) = capacity of vertex v to be: min[c_in(v), c_out(v)], where
c_in(v) is sum of capacities of in-edges and c_out is sum of
capacities of out-edges.
The Algorithm:
- Find vertex v of minimum capacity c. If it's zero, that's great -- we
can remove it and incident edges, updating capacities of neighboring
nodes.
- if c is not zero, then we greedily pull c units of flow from s to v,
and then push it along to t. This seems like just another version of
the original problem. But, there are two points here:
(1) because v had *minimum* capacity, we can do the pulling and
pushing in any greedy way and we won't get stuck. E.g., we can do
BFS, pushing the flow level-by-level, so we never examine any edge
twice. This right away gives us an O(mn) algorithm for saturating
the level graph (O(m) per node, n nodes), for an overall running
time of O(mn^2). So, we're half-way there
(2) To get O(n^2), the way we will do the BFS is to examine the
in-edges (or out-edges) one at a time, fully saturating the edge
before going on to the next one. This means we can allocate our
time into two parts: (a) time spent pushing through edges that get
saturated, and (b) time spent on edges that we didn't quite saturate
[at most one of these per node]. We only take time O(n) on type-b
operations. Type-a operations result in deleting the edge, so over
*all* vertices, the *total* time of these is just O(m).
So, our running time is dominated by the type (b) operations, which is
O(n) per vertex saturated, or O(n^2) total. Once we've saturated the
level graph, we recompute it (in O(m) time, but this can be put into
the O(n^2)) and re-solve. Total time is O(n^3).
MIN-COST MATCHINGS, MIN-COST MAX FLOWS
--------------------------------------
We talked about the problem of assigning students to dorm rooms where
each student had a list of acceptable versus unacceptable rooms. A
natural generalization is to consider what about preferences. E.g,
maybe Dan prefers room A so it costs $0 to match him there, his second
choice is room B so it costs us $1 to match him here, and he hates
room C so it costs $infinity to match him there. And, so on with Sue
and Pat. Then we could ask for the MINIMUM-COST PERFECT MATCHING.
This is a perfect matching that out of all perfect matchings has the
least total cost.
Generalization to flows is called the MIN-COST MAX FLOW problem.
Here, each edge has a cost w(e) as well as a capacity c(e). Cost of a
flow is the sum over all edges of the flow on that edge times the cost
of the edge. I.e.,
cost of flow f = SUM_e w(e)*f(e).
Goal is to find, out of all possible maximum flows, the one with the
least total cost. Can have negative costs (or benefits) on edges too.
- These are more general than plain max flow so can model more things.
- Turns out can solve by similar method as we were using for max flow,
but where each time you find the least-cost path to the sink: the
shortest path where view costs as edge lengths. (Assuming there are
no negative-cost cycles. If there *are* negative cost cycles, then
the flow will look funny -- you'll have these disconnected loops of
flow spinning around. There are ways of solving even that case too,
though.)
-- Tricky thing is that even if originally all costs were positive, in
the residual graph will get negative costs: if it costs $1 to push a
unit of flow on edge e, then unpushing it gives you your $1 back.
Formally, if we have put f units of flow on an edge (u,v) of cost w,
then in the residual graph we have an edge (v,u) of capacity f and
cost -w. E.g., do the matching problem.
-- So, need to use the Bellman-Ford DP algorithm for finding shortest
paths, since Dijkstra's is not guaranteed to work with negative cost
edges.
Won't give formal proof of correctness that this approach finds the
min-cost max flow, but it all hinges on the following: Suppose the
graph doesn't have any negative cost cycles. Now, suppose we find the
least-cost path from s to t and push flow along it. Then there are
still no negative-cost cycles in the residual graph. Easiest to see
by picture: Say we find path s->a->b->c->t. If the new graph now has
a negative-cost cycle, it has to use at least one of the new backward
edges. E.g., c->b->a->x->y->c. But then that means that
s->a->x->y->c->t was a cheaper path in the original graph.
The reason this proves the result is that if you take the supposedly
better max flow f' and subtract our flow f, what you get is a bunch of
cycles in our final residual graph. And if f' was really better than
f, then these would be negative-cost cycles.