15-451 Algorithms			10/31
- maximum flow
  - Edmonds-Karp alg
- min cost max flow and min cost perfect matching
========================================================================

Recap from last time: Network flow problem:  Directed graph, each edge
has capacity.  source s and sink t.   Goal is to flow as much as
possible from s to t.  

Ford-Fulkerson algorithm: find a path from source to sink; push as
much flow as possible along it, compute residual graph, and repeat
until no paths left.

We proved: FF alg finds a max flow (though number of iterations could
be large).  In the process we proved: MAX-FLOW/MIN-CUT theorem: max
S-T flow = min S-T cut. 

 - S-T cut is a partition of vertex set into A and B where S \in A, T \in B.
 - capacity of cut is the sum of capacities of all edges going from A to B.
 - obvious part is any flow <= capacity of any cut.  (So, max flow <=
	min cut).  Less obvious is that max flow = min cut.

Today:
 - A guaranteed efficient algorithm: Edmonds-Karp
 - But first: an application (There are lots of these)

Application: 3-D image processing.  Given a pixel image, where each
pixel is 0 or 1 (indicating foreground or background), want to clean
it up using the rule that in general neighboring pixels are usually
supposed to be the same value. [In actual application, usually have
many depth levels, but we will just consider 2 levels].  Say that
given image I, the "cost" or "energy" of some modification I' is
c*(#locations in which I' differs from I) + (number of neighboring
pixels in I' that have different values).  Think of as: costs $c for
flipping a bit, and costs $1 for each pair of neighbors that are
different.  Want the image of lowest energy (cost).  Usually solved by
simulated annealing but this is *slow*.  Faster way is to turn into a
min cut problem.  Here's how we do it: set up 0-source and 1-sink.
Connect source to all 0s, and sink to all 1's by edges of capacity c.
Connect neighboring pixels by edge of capacity 1.  Claim: any solution
corresponds to a cut, with energy = value of cut.  So, we just want
the min S-T cut.  The problem with multple depths corresponds to the
"multiway cut" problem: Given a collection of 'terminal' nodes S1, S2,
..., Sk, find the partition into regions U1, U2, ..., Uk with Si in
Ui, of least total cost.  Unfortunately, that problem is NP hard.
But, it turns out that even for this problem, you can use the min cut
problem to give you a heuristic that produces good solutions in
practice.

Now, let's go to algorithms:

Recall that running time of FF is not necessarily so great - it
depends on the choices we make.  E.g., (give the standard diamond
graph) How to fix the problem?

EDMONDS-KARP idea/analysis 
-------------------------- 
  One natural heuristic is each time to use the *shortest* augmenting
path.  This fixes the problem in the above example.  What about in
general?  Edmonds and Karp proved that this guarantees a total of at
most O(n*m) iterations.  [Note: not obvious: what if add extra nodes
to top-left and bottom-right edge to lengthen them].  BFS to find path
is O(m) so this guarantees us a maximum total time of O(n*m^2).

Proof is in two parts.  First part is: say d(v) is distance from start
to v in the current residual graph. As we find augmenting paths and
change the residual graph, d(v) may change, but the first part is
CLAIM 1: d(v) never decreases.

Let's use this to prove the theorem and then go back.  

Notice that each time we find an augmenting path, we saturate at least
one edge on it, so at least one edge that was in the residual graph
before is no longer in the new one.  It's possible that later this
edge will come back (ie., if we later push flow in the other direction
on it) but what we'll show is that any given edge can be removed at
most n/2 times.  This gives us what we want [why?] because there are m
edges in our original graph, and each iteration removes or re-adds at
least one. 

To show: each edge u->v can be removed at most n/2 times.  We'll use
fact that it's a shortest path.  Suppose we remove u->v.  That means
our path went on that edge, so d_old(v) = 1+d_old(u) since it was a
shortest path. Ifwe later put the edge back, we must have gone v->u,
so in the new graph d_new(u) = 1+d_new(v).  But, CLAIM 1 tells us that
d_new(v) >= d_old(v) so that means d_new(u) >= 2+d_old(u).  Since the
distance from s to u can't become larger than n-1 without becoming
disconnected (and then we never touch u again) this means we can do
this at most n/2 times.

So, all that's left is to show Claim 1.

We're going to prove it by contradiction.  BTW, our high level
strategy here is a kind of canonical proof by contradiction: we are
going to try to create as many things as possible to contradict as we go.
Suppose claim 1 is false.  We run one step of the algorithm, and for some
vertex v, d_new(v) < d_old(v).  Let's now look at the new shortest
path P to v.  Let's pick v to be the leftmost (closest to S) vertex on
this path whose distance has decreased. [This will help our proof
since it's one more thing we can try to contradict].  We know P
must have at least one edge that wasn't in the earlier residual
graph.  Let's pick the rightmost (closest to v) such edge and say it's
x->y. [Again, one more thing we can try to contradict].  In order to
add x->y as an edge to the residual graph, it must be that the
augmenting path chosen by the algorithm used y->x.  Since this was the
*shortest* path, it must have been that d_old(x) = d_old(y)+1.
But, now we have d_new(x) = d_new(y) - 1.  Since we know that 
d_new(x) >= d_old(x) [else this contradicts our choice of v], this
means d_new(y) >= d_old(y)+2.  But, that means the old graph had a
way to get to v that was shorter than P: namely first get to y, and
then follow the rest of P to v [remember, we picked x->y to be the
rightmost edge in P that was not in the old graph, so the y-to-v part
was in the graph].  That contradicts our assumption that d_new(v) <
d_old(v) since in the old graph, one way to get to v is to get to y at
cost d_old(y) and then follow P from y to v (remember our defn of y is
that this portion of P *did* exist in the old graph) and this total
cost is d_new(v) - 2.  

Another natural heuristic: choose the path that you can push the most
flow on.  (The maximum bottleneck capacity path).  You can show that
this takes at most O(m * log(f)) iterations, where f = the maximum flow.

Other faster algorithms too.  There's been a lot of work on algs for
this problem.

Mention a related topic but not give any proofs.

MIN-COST MATCHINGS, MIN-COST MAX FLOWS
--------------------------------------

We talked about the problem of assigning students to dorm rooms where
each student had a list of acceptable versus unacceptable rooms.  A
natural generalization is to consider what about more general
preferences.  E.g, Dan prefers room A so it costs $0 to match him
there, his second choice is room B so it costs us $1 to match him
here, and he hates room C so it costs $infinity to match him there.
And, so on with Sue and Pat.  Then we could ask for the MINIMUM-COST
PERFECT MATCHING.  This is a perfect matching that out of all perfect
matchings has the least total cost.

Generalization to flows is called the MIN-COST MAX FLOW problem.
-- here each edge has a *cost* as well as a capacity.  Cost of a flow
is the sum over all edges of the flow on that edge times the cost of
the edge.  Can have negative costs (or benefits) on edges too.  

-- These are more general than plain max flow so can model more
things.

-- Turns out can solve by similar method as we were using for max
flow, but where each time you find the least-cost path to the sink:
the shortest path where view costs as edge lengths.   (Assuming there
are no negative-cost cycles.  If there *are* negative cost cycles,
then the flow will look funny -- you'll have these disconnected loops
of flow spinning around.  There are ways of solving even that case
too, though.)

-- Tricky thing is that even if originally all costs were positive, in
the residual graph will get negative costs - if it costs $1 to push a
unit of flow on edge e, then unpushing it gives you your $1 back:
e.g., do the matching problem.

-- So, need to use the Bellman-Ford DP algorithm for finding shortest
paths, since Dijkstra's is not guaranteed to work with negative cost
edges.