15-451 Algorithms		10/31/06
Network Flow II					
- Edmonds-Karp #1				
- Edmonds-Karp #2
- further improvements...
- min-cost max flow
======================================================================

NETWORK FLOW RECAP: Given a directed graph G, a source s, and a sink
t.  Each edge (u,v) has some capacity c(u,v).   Goal is to find the
maximum flow possible from s to t.

Example: 4 nodes {s,a,b,t}.  s-->a has cap 4, a-->t has cap 2, s-->b
of cap 2, b-->t of cap 3, and a-->b of cap 3.  (all non-edges can be
viewed as edges of capacity 0).

Last time: looked at Ford-Fulkerson algorithm.  Used to prove
maxflow-mincut theorem, also integral flow theorem.

Key idea: Given a flow f so far, we describe the capacities left over
in a "residual graph".  Specifically, the residual graph has all edges
of positive "residual capacity", defined as: 
                 c_f(u,v) = c(u,v) - f(u,v)
where we define  
                   f(v,u) = -f(u,v)

FF alg is: find path of positive residual capacity; push flow to
saturate; repeat. 

Point of this is that any legal flow in the residual graph, when added
to f (edge by edge) is a legal flow in the original graph (satisfies
capacity constraints; obeys flow in = flow out from each node).  And,
this is true vice-versa.   This means that the maximum flow in the
residual graph, plus f, is a maximum flow in the original graph.

[run algorithm on example.  Draw flow using flow/cap notation in orig graph]

IMPROVING THE RUNNING TIME OF NETWORK FLOW
==========================================
Assuming capacities are integers, the basic Ford-Fulkerson algorithm
could make up to F iterations, where F is the value of the max flow.
Each iteration takes O(m) time to find a path (e.g., DFS or BFS)
and to compute the residual graph.  So, total time is O(mF). 

This is fine if F is small, like in the matching case.  Not good if
capacities are in binary and F could be very large.

[Note: let's assume graph has been pre-processed to delete any
disconnected parts.  So, m >= n-1, and can say DFS runs in time O(m)
instead of O(m+n)].


Edmonds-Karp #1
---------------
Idea: don't just choose an arbitrary path.  Instead, pick the path of
largest (or approximately largest) capacity.

Two versions:
(1.1) use maximum bottleneck path (path with largest residual
      capacity).  We saw how to do this using a Prim-like algorithm in
      time O(m log n).

(1.2) Define a "capacity threshold" c, and then just try to find an
      s-t path of residual capacity >= c.  Advantage of this: can do
      in O(m) time.  Just do DFS over edges of residual capacity >= c.

      How to set c?  Can start at max capacity over all edges, and if
      that fails you then divide by 2. This guarantees you are within
      a factor of 2 of max capacity path.  You waste some time with
      the unsuccessful runs, but you have at most O(log C) of them in total
      (C = max edge capacity).  Can reduce to O(log F) using doubling idea
      from midterm.  

Claim: Either way, this causes FF to make at most O(m log F)
       iterations.  So total time using version 1.2 is O(m^2 log F).

Proof of claim:  First, if the current residual graph has max flow f,
then if we drop all edges of res cap < f/m, there must still exist a
path from s to t (why? because let A be the component containing s.
If doesn't have t, then since there at most m edges out of A, this
would be a cut < f).  So, max bottleneck path has capacity at least
f/m, and (if using version 1.2) this means c is at least f/(2m).

Let's just focus on version 1.2 (since then 1.1 follows directly).  We
have c is at least f/(2m).  So can have at *most* 2m iterations before
c gets cut down by factor of 2.  Since c <= F at the start, we can cut
down c at most log(F) times, so total number of iterations is at most
2m*log(F). 


Can we remove dependence on F completely?

EDMONDS-KARP #2
---------------
Edmonds-Karp #2: always pick the *shortest* path (rather than the
		 max-capacity path)

Claim: this makes at most mn iterations.  So, run time is O(nm^2)
since we use BFS in each iteration.  Proof is pretty neat.

Proof:  Let d(t) be the distance from s to t in the current residual
graph.  We'll prove the theorem by showing that every m iterations,
d(t) has to go up by at least 1.  

Let's lay out the graph in levels according to a BFS from s.  I.e.,
nodes at level i are distance i away from s, and t is at level d(t).
Now, notice that so long as d(t) doesn't change, the paths found WILL
ONLY USE FORWARD EDGES IN THIS LAYOUT.  Each iteration saturates (and
removes) at least 1 forward edge, and adds only backward edges (so no
distance ever drops).  This can happen at most m - d(t)+1 <= m times
since after that many times, t would become disconnected from s.  So,
after m iterations, either t is disconnected (and d(t) = infinity) or
else we must have used a non-forward edge, implying that d(t) has gone
up by 1.

Since d(t) can increase at most n times, there are at most nm iterations.

DINIC & MPM 
===========
Can we do this a little faster? (this part won't be on any tests and
it's not crucial you get the details, but the idea is pretty nice.)

Previous alg: mn iterations at O(m) each for O(m^2 n) total.  Let's
see if we can reduce to O(mn^2) and finally to O(n^3).  Idea: Given
the current BFS layout (also called the "level graph"), we'll try in
O(n^2) time all at once to find the max flow that only uses forward
edges. This is sometimes called a "blocking flow".  Guarantees that
when we take the residual graph, d(t) has gone up by 1.  So, <= n iterations.

Define c(v) = capacity of vertex v to be: min[c_in(v), c_out(v)], where
c_in(v) is sum of capacities of in-edges and c_out is sum of
capacities of out-edges.

The Algorithm:

- In O(m) time, create BFS layout from s, and intersect with backwards
  BFS from t.  Compute capacities of all nodes.

- Find vertex v of minimum capacity c.  If it's zero, that's great: we
can remove v and incident edges, updating capacities of neighboring nodes.

- if c is not zero, then we greedily pull c units of flow from s to v,
and then push it along to t.  This seems like just another version of
the original problem.  But, there are two points here: 

  (1) because v had *minimum* capacity, we can do the pulling and
  pushing in any greedy way and we won't get stuck.  E.g., we can do
  BFS forward from v and backward from v, pushing the flow
  level-by-level, so we never examine any edge twice.  [do example]
  This right away gives us an O(mn) algorithm for saturating the level
  graph (O(m) per node, n nodes), for an overall running time of
  O(mn^2).  So, we're half-way there.

  (2) To get O(n^2), the way we will do the BFS is to examine the
  in-edges (or out-edges) one at a time, fully saturating the edge
  before going on to the next one.  This means we can allocate our
  time into two parts: (a) time spent pushing through edges that get
  saturated, and (b) time spent on edges that we didn't quite saturate
  [at most one of these per node].  We only take time O(n) on type-b
  operations.  Type-a operations result in deleting the edge, so the
  edge won't be used again in the current level graph.  So over *all*
  vertices, the *total* time of these is just O(m).

So, our running time per iteration is O(n^2 + m) = O(m).  [O(n^2) for
the type-b and O(m) for the type-a]  Once we've saturated the
level graph, we recompute it (in O(m) time, but this can be put into
the O(n^2)) and re-solve.  Total time is O(n^3).


MIN-COST MATCHINGS, MIN-COST MAX FLOWS
--------------------------------------

We talked about the problem of assigning groups to time-slots where
each group had a list of acceptable versus unacceptable slots.  A
natural generalization is to ask: what about preferences?  E.g,
maybe group A prefers slot 1 so it costs $0 to match to there, their
second choice is slot 2 so it costs us $1 to match the group here, and
it can't make slot 3 so it costs $infinity to match to there.  And, so
on with the other groups.  Then we could ask for the MINIMUM-COST
PERFECT MATCHING. This is a perfect matching that out of all perfect
matchings has the least total cost.

Generalization to flows is called the MIN-COST MAX FLOW problem.

Here, each edge has a cost w(e) as well as a capacity c(e).  Cost of a
flow is the sum over all edges of the flow on that edge times the cost
of the edge.  I.e., 

          cost of flow f  =  SUM_e w(e)*f(e).  

Goal is to find, out of all possible maximum flows, the one with the
least total cost.  Can have negative costs (or benefits) on edges too.  

- These are more general than plain max flow so can model more things.

There are a couple ways to solve this.   One way is to add an edge
from t to s of infinite capacity and large negative cost, and to
convert this into the "MIN-COST CIRCULATION" problem.  Basically, by
doing this, we now can just have the goal of finding a circulation (a
flow that obeys flow-in = flow-out at EVERY node (no s and t anymore))
that has smallest cost (this will be a negative number).    This will
be the min-cost max-flow in the original graph.


Algorithm: find a negative cost cycle and saturate it.

Can use Bellman-Ford to find negative cost cycles.

Running time will be like Ford-Fulkerson.  To speed up, ideally would
like the "most-negative-cost cycle" but that's NP-hard. 
Instead there are various tricks you can use to still do well.  E.g.,
in the Goldberg-Tarjan algorithm, you instead find the cycle that
minimizes (cost of cycle)/(# edges in cycle), which is something you
*can* solve.  [add some value k to all edges and check if there's
still a negative-cost cycle using BF.  Do binary search to find the
value of k s.t. the min-cost cycle has value 0.  That's the one that
minimizes (cost of cycle)/(# edges in cycle) in the original graph.]

Then you prove this has a bound roughly like Edmonds-Karp #1.