15-451  Algorithms                     10/28/03
Network Flow, part II
- Recap from last time
- Edmonds-Karp #2
- Dinic & MPM
- min-cost max flow
===========================================================================

NETWORK FLOW RECAP: given a directed graph G, a source s, and a sink
t.  Each edge (u,v) has some capacity c(u,v).   Goal is to find the
maximum flow possible from s to t.

Last time: looked at Ford-Fulkerson algorithm.  Used to prove
maxflow-mincut theorem, also integral flow theorem.

Key idea: Given a flow f so far, we describe the capacities left over
in a "residual graph".  Specifically, the residual graph has all edges
of positive "residual capacity", defined as: 
                 c_f(u,v) = c(u,v) - f(u,v)
where we define  
                   f(v,u) = -f(u,v)

For example: 4 nodes {s,a,b,t}.  s-->a has cap 4, a-->t has cap 2,
s-->b of cap 2, b-->t of cap 3, and a-->b of cap 3.  Start with
initial flow of 3 along s,a,b,t.

Point of all this is that any legal flow in the residual graph, when
added to f (edge by edge) is a legal flow in the original graph
(satisfies capacity constraints; obeys flow in = flow out from each
node).  And, this is true vice-versa.  This means that the maximum
flow in the residual graph, plus f, is a maximum flow in the original
graph. 

We then saw how to encode bipartite matching as a maxflow problem:
create a fake source s, a fake sink t.  All edges point forward with
capacity 1.  Show that any integral flow is a matching of the same
value, and any matching can be converted to a flow of the same value.
So the maximum flow gives us a maximum matching in the original graph.

Then near the end we had a discussion of running time.  Each iteration
of Ford-Fulkerson takes O(m) time to find a path (e.g., DFS or BFS)
and to compute the residual graph. [m = # edges].  Each iteration
pushes at least one more unit of flow (assuming all capacities are
integers).  So, at most O(mF) time total, where F is the value of the
maximum flow.

Time O(mF) is fine if F is fairly small.  E.g., in the case of
matchings, we have F <= n.   But what if F is a large number? 

Edmonds-Karp #1: push flow along widest path.  We saw that this
   ensures at most O(m*log(F)) iterations (assuming all capacities are
   integers).  Each iteration involves running a Prim-like algorithm.
   Takes time O(m log n).  Total time is O(m^2 * log(F) * log(n)).

Can we remove dependence on F?

EDMONDS-KARP #2
---------------
Edmonds-Karp #2: always pick the *shortest* path (rather than the
		 max-capacity path)

[Do example on standard cut-diamond graph, but where add extra nodes
to bottom-right and top-left edges to lengthen them.]

Claim: this makes at most mn iterations.  So, run time is O(nm^2)
since we use BFS in each iteration.

Proof:  Let d(t) be the distance froms to t in the current residual
graph.  We'll prove the theorem by showing that every m iterations,
d(t) has to go up by at least 1.  

Let's lay out the graph in levels according to a BFS from s.  I.e.,
nodes at level i are distance i away from s, and t is at level d(t)..
Now, notice that so long as d(t) doesn't change, the paths found WILL
ONLY USE FORWARD EDGES IN THIS LAYOUT.  Each iteration saturates (and
removes) at least 1 forward edge, and adds only backward edges (so
none of the distances ever drops).  This can happen at most m-d(t) < m
times since after that many time, t would become disconnected from s.
So, after m iterations, either t is disconnected (and d(t) = infinity)
or else we must have used a non-forward edge, implying that d(t) has
gone up by 1.

Since d(t) can increase at most n times, there are at most nm iterations.


DINIC & MPM 
===========
(this is not in the book, and it's not so crucial you get the details,
but it's kind of neat I think.)

Can we do this a little faster?

Previous alg: mn iterations at O(m) each for O(m^2 n) total.  Let's
see if we can reduce to O(n^3).  Idea: Given the current BFS layout
(also called the "level graph"), we'll try in O(n^2) time all at once
to find the max flow that only uses forward edges. This is sometimes
called a "blocking flow".  Guarantees that when we take the residual
graph, d(t) has gone up by 1.  So, at most n iterations.

Define c(v) = capacity of vertex v to be: min[c_in(v), c_out(v)], where
c_in(v) is sum of capacities of in-edges and c_out is sum of
capacities of out-edges.

The Algorithm:

- Find vertex v of minimum capacity c.  If it's zero, that's great -- we
can remove v and incident edges, updating capacities of neighboring
nodes.   This step takes O(m) time.

- if c is not zero, then we greedily pull c units of flow from s to v,
and then push it along to t.  This seems like just another version of
the original problem.  But, there are two points here: 

  (1) because v had *minimum* capacity, we can do the pulling and
  pushing in any greedy way and we won't get stuck.  E.g., we can do
  BFS forward from v and backward from v, pushing the flow
  level-by-level, so we never examine any edge twice.  [do example]
  This right away gives us an O(mn) algorithm for saturating the level
  graph (O(m) per node, n nodes), for an overall running time of
  O(mn^2).  So, we're half-way there.

  (2) To get O(n^2), the way we will do the BFS is to examine the
  in-edges (or out-edges) one at a time, fully saturating the edge
  before going on to the next one.  This means we can allocate our
  time into two parts: (a) time spent pushing through edges that get
  saturated, and (b) time spent on edges that we didn't quite saturate
  [at most one of these per node].  We only take time O(n) on type-b
  operations.  Type-a operations result in deleting the edge, so the
  edge won't be used again in the current iteration.  So over *all*
  vertices, the *total* time of these is just O(m).


So, our running time per iteration is O(n^2 + m) = O(m).  [O(n^2) for
the type-b and O(m) for the type-a]  Once we've saturated the
level graph, we recompute it (in O(m) time, but this can be put into
the O(n^2)) and re-solve.  Total time is O(n^3).


MIN-COST MATCHINGS, MIN-COST MAX FLOWS
--------------------------------------

We talked about the problem of assigning students to dorm rooms where
each student had a list of acceptable versus unacceptable rooms.  A
natural generalization is to consider what about preferences.  E.g,
maybe Dan prefers room A so it costs $0 to match him there, his second
choice is room B so it costs us $1 to match him here, and he hates
room C so it costs $infinity to match him there.  And, so on with Sue
and Pat.  Then we could ask for the MINIMUM-COST PERFECT MATCHING.
This is a perfect matching that out of all perfect matchings has the
least total cost.

Generalization to flows is called the MIN-COST MAX FLOW problem.

Here, each edge has a cost w(e) as well as a capacity c(e).  Cost of a
flow is the sum over all edges of the flow on that edge times the cost
of the edge.  I.e., 

          cost of flow f  =  SUM_e w(e)*f(e).  

Goal is to find, out of all possible maximum flows, the one with the
least total cost.  Can have negative costs (or benefits) on edges too.  

- These are more general than plain max flow so can model more things.

There are a couple ways to solve this.   One way is to add an edge
from t to s of infinite capacity and large negative cost, and to
convert this into the "MIN-COST CIRCULATION" problem.  Basically, by
doing this, we now can just have the goal of finding a circulation (a
flow that obeys flow-in = flow-out at EVERY node (no s and t anymore))
that has smallest cost (this will be a negative number).    This will
be the min-cost max-flow in the original graph.


Algorithm: find a negative cost cycle and saturate it.

Can use Bellman-Ford to find negative cost cycles.

Running time will be like Ford-Fulkerson.  To speed up, ideally would
like the "most-negative-cost cycle" but that's NP-hard. 
Instead there are various tricks you can use to still do well.  E.g.,
in the Goldberg-Tarjan algorithm, you instead find the cycle that
minimizes (cost of cycle)/(# edges in cycle), which is something you
*can* solve.  [add some value k to all edges and check if there's
still a negative-cost cycle using BF.  Do binary search to find the
value of k s.t. the min-cost cycle has value 0.  That's the one that
minimizes (cost of cycle)/(# edges in cycle) in the original graph.]

Then you prove this has a bound roughly like Edmonds-Karp #1.