15-750 Graduate Algorithms 3/6/01 - recap - applications - Edmonds-Karp 2 - speedups Reading: Chapter 17 & 18 =========================================================================== Last time: described max flow problem. Ford-Fulkerson algorithm, and proof of maxflow-mincut theorem, also integral flow theorem. Ended with Edmonds-Karp version 1 (always augment along path of maximum capacity). # iterations at most m*log(F), where F is the max flow value (assuming all capacities are integers). Each iteration involves running a Prim-like algorithm. Recap of Edmonds-Karp argument: First, we can *find* the desired path by building a tree. Starting from U = {s}, always put in the widest (highest capacity) edge leaving the set U. Produces a tree. The path from s to t is the one we want. Claim is: this guarantees to find a path of capacity at least a 1/m fraction of the max flow in the current graph. Proof: if we ever get to a point where all edges leaving U have capacity < f/m, then that means the (U,V-U) cut has capacity < (f/m)*m = f. This contradicts flow <= cut. So, each path reduces remaining flow by (1-1/m). AN INTERESTING APPLICATION: Application: 3-D image processing. Given a pixel image, where each pixel is 0 or 1 (indicating foreground or background), want to clean it up using the rule that in general neighboring pixels are usually supposed to be the same value. [In actual application, usually have many depth levels, but we will just consider 2 levels]. Say that given image I, the "cost" or "energy" of some modification I' is c*(#locations in which I' differs from I) + (number of neighboring pixels in I' that have different values). Think of as: costs $c for flipping a bit, and costs $1 for each pair of neighbors that are different. Want the image of lowest energy (cost). Usually solved by simulated annealing but this is *slow*. Faster way is to turn into a min cut problem. Here's how we do it: set up 0-source and 1-sink. Connect source to all 0s, and sink to all 1's by edges of capacity c. Connect neighboring pixels by edge of capacity 1. Claim: any solution corresponds to a cut, with energy = value of cut. So, we just want the min S-T cut. The problem with multple depths corresponds to the "multiway cut" problem: Given a collection of 'terminal' nodes S1, S2, ..., Sk, find the partition into regions U1, U2, ..., Uk with Si in Ui, of least total cost. Unfortunately, that problem is NP hard. But, it turns out that even for this problem, you can use the min cut problem to give you a heuristic that produces good solutions in practice. EDMONDS-KARP #2 --------------- Edmonds-Karp #2: always pick the *shortest* path. [Do example on standard cut-diamond graph, but where add extra nodes to bottom-right and top-left edges to lengthen them.] Claim: this makes at most O(mn) iterations. So, run time is O(nm^2) since we use BFS in each iteration. Will give two proofs (they both have the same 1st half). Proof #1: Let d(v) be the shortest path distance from s to v in the current residual graph. Also, for visualization, let's lay out the graph in levels according to a BFS from s: so we have levels for d=0, d=1, d=2, etc. Now, each time we perform an iteration, we may add new edges into the residual graph, but the d(v)'s NEVER GO DOWN. Why? [Because we only add edges that go backwards in the BFS levels]. Each time we find an augmenting path, we saturate at least one edge on it. Let's say we saturate edge u->v, where u is at level i and v is at level i+1. Claim: if we ever re-add edge u->v, then u must be at level i+2 or more. Why? [Because we can only add this edge if our shortest path used the edge v->u. Since the d()'s never go down, v is at level at least i+1, so u is at level at least i+2.] So, any given edge can be removed at most n/2 times. Since each iteration removes or re-adds at least one edge from the original graph, this means there can be at most mn iterations. QED Proof #2: Another way to argue the second half is to just look at distance from s to t. Start with BFS levels for original graph, and don't recompute until d(t) changes. Notice that so long as d(t) doesn't change, all paths will use forward edges. Each iteration removes at least one forward edge, so we can do this at most m times before we need to recompute. d(t) can increase at most n times, so again we get at most mn iterations. DINIC & MPM =========== Can we do this a little faster? [This is chapter 18 in the book] Previous alg: O(mn) iterations at O(m) each for O(m^2 n) total. Let's try to reduce to O(n^3). Idea: Let d = d(t). We will try to find lots of paths of length d all at once. To do this, let's look at "level graph" which is what you get by computing BFS levels and then deleting all edges that go backward or have both endpoints in the same level (since we know we won't need to use them in paths of length d). We now will try to push as much flow as possible in this graph. At that point, we then recompute residual and the new leveled graph and repeat. Terminology: a flow that saturates this level graph is called a "blocking flow". So, our goal is to find a blocking flow in time O(n^2). Define c(v) = capacity of vertex v to be: min[c_in(v), c_out(v)], where c_in(v) is sum of capacities of in-edges and c_out is sum of capacities of out-edges. The Algorithm: - Find vertex v of minimum capacity c. If it's zero, that's great -- we can remove it and incident edges, updating capacities of neighboring nodes. - if c is not zero, then we greedily pull c units of flow from s to v, and then push it along to t. This seems like just another version of the original problem. But, there are two points here: (1) because v had *minimum* capacity, we can do the pulling and pushing in any greedy way and we won't get stuck. This observation means we can saturate node v in O(m) time [we can do BFS, pushing the flow level-by-level, so we never examine any edge twice], which gives us an O(mn) algorithm for saturating the level graph (O(m) per node, n nodes). To get the time bound we want, we need one thing: the way we will do the pulling and pushing is we examine the edges one at a time, fully saturating the edge before going on to the next one. This means we can allocate our time into two parts: (a) time spent pushing through edges that get saturated, and (b) time spent on edges that we didn't quite saturate [at most one of these per node]. We only take time O(n) on type-b operations. Type-a operations result in deleting the edge, so over *all* vertices, the *total* time of these is just O(m). So, our running time is dominated by the type (b) operations, which is O(n) per vertex saturated, or O(n^2) total. Once we've saturated the level graph, we recompute it (in O(m) time, but this can be put into the O(n^2)) and re-solve. Total time is O(n^3). ?? Book talks about storing vertices in a fib-heap so we can do the remove-min and decrease-key operations quickly. Do we really need this? Why not just an array, since it's OK to spend O(n) time to do remove-min, so long as decrease-key is O(1)?