15-451 Algorithms 10/31 - maximum flow - Edmonds-Karp alg - min cost max flow and min cost perfect matching ======================================================================== Recap from last time: Network flow problem: Directed graph, each edge has capacity. source s and sink t. Goal is to flow as much as possible from s to t. Ford-Fulkerson algorithm: find a path from source to sink; push as much flow as possible along it, compute residual graph, and repeat until no paths left. We proved: FF alg finds a max flow (though number of iterations could be large). In the process we proved: MAX-FLOW/MIN-CUT theorem: max S-T flow = min S-T cut. - S-T cut is a partition of vertex set into A and B where S \in A, T \in B. - capacity of cut is the sum of capacities of all edges going from A to B. - obvious part is any flow <= capacity of any cut. (So, max flow <= min cut). Less obvious is that max flow = min cut. Today: - A guaranteed efficient algorithm: Edmonds-Karp - But first: an application (There are lots of these) Application: 3-D image processing. Given a pixel image, where each pixel is 0 or 1 (indicating foreground or background), want to clean it up using the rule that in general neighboring pixels are usually supposed to be the same value. [In actual application, usually have many depth levels, but we will just consider 2 levels]. Say that given image I, the "cost" or "energy" of some modification I' is c*(#locations in which I' differs from I) + (number of neighboring pixels in I' that have different values). Think of as: costs $c for flipping a bit, and costs $1 for each pair of neighbors that are different. Want the image of lowest energy (cost). Usually solved by simulated annealing but this is *slow*. Faster way is to turn into a min cut problem. Here's how we do it: set up 0-source and 1-sink. Connect source to all 0s, and sink to all 1's by edges of capacity c. Connect neighboring pixels by edge of capacity 1. Claim: any solution corresponds to a cut, with energy = value of cut. So, we just want the min S-T cut. The problem with multple depths corresponds to the "multiway cut" problem: Given a collection of 'terminal' nodes S1, S2, ..., Sk, find the partition into regions U1, U2, ..., Uk with Si in Ui, of least total cost. Unfortunately, that problem is NP hard. But, it turns out that even for this problem, you can use the min cut problem to give you a heuristic that produces good solutions in practice. Now, let's go to algorithms: Recall that running time of FF is not necessarily so great - it depends on the choices we make. E.g., (give the standard diamond graph) How to fix the problem? EDMONDS-KARP idea/analysis -------------------------- One natural heuristic is each time to use the *shortest* augmenting path. This fixes the problem in the above example. What about in general? Edmonds and Karp proved that this guarantees a total of at most O(n*m) iterations. [Note: not obvious: what if add extra nodes to top-left and bottom-right edge to lengthen them]. BFS to find path is O(m) so this guarantees us a maximum total time of O(n*m^2). Proof is in two parts. First part is: say d(v) is distance from start to v in the current residual graph. As we find augmenting paths and change the residual graph, d(v) may change, but the first part is CLAIM 1: d(v) never decreases. Let's use this to prove the theorem and then go back. Notice that each time we find an augmenting path, we saturate at least one edge on it, so at least one edge that was in the residual graph before is no longer in the new one. It's possible that later this edge will come back (ie., if we later push flow in the other direction on it) but what we'll show is that any given edge can be removed at most n/2 times. This gives us what we want [why?] because there are m edges in our original graph, and each iteration removes or re-adds at least one. To show: each edge u->v can be removed at most n/2 times. We'll use fact that it's a shortest path. Suppose we remove u->v. That means our path went on that edge, so d_old(v) = 1+d_old(u) since it was a shortest path. Ifwe later put the edge back, we must have gone v->u, so in the new graph d_new(u) = 1+d_new(v). But, CLAIM 1 tells us that d_new(v) >= d_old(v) so that means d_new(u) >= 2+d_old(u). Since the distance from s to u can't become larger than n-1 without becoming disconnected (and then we never touch u again) this means we can do this at most n/2 times. So, all that's left is to show Claim 1. We're going to prove it by contradiction. BTW, our high level strategy here is a kind of canonical proof by contradiction: we are going to try to create as many things as possible to contradict as we go. Suppose claim 1 is false. We run one step of the algorithm, and for some vertex v, d_new(v) < d_old(v). Let's now look at the new shortest path P to v. Let's pick v to be the leftmost (closest to S) vertex on this path whose distance has decreased. [This will help our proof since it's one more thing we can try to contradict]. We know P must have at least one edge that wasn't in the earlier residual graph. Let's pick the rightmost (closest to v) such edge and say it's x->y. [Again, one more thing we can try to contradict]. In order to add x->y as an edge to the residual graph, it must be that the augmenting path chosen by the algorithm used y->x. Since this was the *shortest* path, it must have been that d_old(x) = d_old(y)+1. But, now we have d_new(x) = d_new(y) - 1. Since we know that d_new(x) >= d_old(x) [else this contradicts our choice of v], this means d_new(y) >= d_old(y)+2. But, that means the old graph had a way to get to v that was shorter than P: namely first get to y, and then follow the rest of P to v [remember, we picked x->y to be the rightmost edge in P that was not in the old graph, so the y-to-v part was in the graph]. That contradicts our assumption that d_new(v) < d_old(v) since in the old graph, one way to get to v is to get to y at cost d_old(y) and then follow P from y to v (remember our defn of y is that this portion of P *did* exist in the old graph) and this total cost is d_new(v) - 2. Another natural heuristic: choose the path that you can push the most flow on. (The maximum bottleneck capacity path). You can show that this takes at most O(m * log(f)) iterations, where f = the maximum flow. Other faster algorithms too. There's been a lot of work on algs for this problem. Mention a related topic but not give any proofs. MIN-COST MATCHINGS, MIN-COST MAX FLOWS -------------------------------------- We talked about the problem of assigning students to dorm rooms where each student had a list of acceptable versus unacceptable rooms. A natural generalization is to consider what about more general preferences. E.g, Dan prefers room A so it costs $0 to match him there, his second choice is room B so it costs us $1 to match him here, and he hates room C so it costs $infinity to match him there. And, so on with Sue and Pat. Then we could ask for the MINIMUM-COST PERFECT MATCHING. This is a perfect matching that out of all perfect matchings has the least total cost. Generalization to flows is called the MIN-COST MAX FLOW problem. -- here each edge has a *cost* as well as a capacity. Cost of a flow is the sum over all edges of the flow on that edge times the cost of the edge. Can have negative costs (or benefits) on edges too. -- These are more general than plain max flow so can model more things. -- Turns out can solve by similar method as we were using for max flow, but where each time you find the least-cost path to the sink: the shortest path where view costs as edge lengths. (Assuming there are no negative-cost cycles. If there *are* negative cost cycles, then the flow will look funny -- you'll have these disconnected loops of flow spinning around. There are ways of solving even that case too, though.) -- Tricky thing is that even if originally all costs were positive, in the residual graph will get negative costs - if it costs $1 to push a unit of flow on edge e, then unpushing it gives you your $1 back: e.g., do the matching problem. -- So, need to use the Bellman-Ford DP algorithm for finding shortest paths, since Dijkstra's is not guaranteed to work with negative cost edges.