15-451 Algorithms 3/18/04 - Edmonds-Karp 2 - Dinic & MPM - min-cost max flow Notes from Spring 2002 GLM =========================================================================== Last time we discussed max flow problem. Ford-Fulkerson algorithm, and proof of maxflow-mincut theorem, also integral flow theorem. Key idea: Given a flow f so far, we compute the residual graph that describes the capacities left over. Specifically, the residual graph has all edges of positive "residual capacity", defined as: c_f(u,v) = c(u,v) - f(u,v) where we define f(v,u) = -f(u,v) For example: 4 nodes {s,a,b,t}. s-->a has cap 4, a-->t has cap 3, s-->b of cap 3, b-->t of cap 4, and a-->b of cap 2. Start with initial flow of 2 along s,a,b,t. Point of all this is that the maximum flow in the residual graph, plus f, is a maximum flow in the original graph. We ended with Edmonds-Karp version 1 (always augment along path of maximum capacity). # iterations at most m*log(F), where F is the max flow value (assuming all capacities are integers), m = number of edges. Each iteration involves running a Prim-like algorithm. EDMONDS-KARP #2 --------------- Edmonds-Karp #2: always pick the *shortest* path. [Do example on standard cut-diamond graph, but where add extra nodes to bottom-right and top-left edges to lengthen them.] Claim: this makes at most mn iterations. So, run time is O(nm^2) since we use BFS in each iteration. Proof: Let d(t) be the distance froms to t in the current residual graph. We'll prove the theorem by showing that every m iterations, d(t) has to go up by at least 1. Let's lay out the graph in levels according to a BFS from s. I.e., nodes at level i are distance i away from s, and t is at level d(t).. Now, notice that so long as d(t) doesn't change, the paths found will only use forward edges in this layout. Each iteration saturates (and removes) at least 1 forward edge, and adds only backward edges (so none of the distances ever drops). This can happen at most m times before we're forced to use paths that contain backward edges, implying that d(t) has gone up by 1 (actually, 2). So, there are at most nm iterations. DINIC & MPM =========== (this is not in the book, and it's not so crucial you get the details, but it's kind of neat I think.) Can we do this a little faster? Previous alg: mn iterations at O(m) each for O(m^2 n) total. Let's see if we can reduce to O(n^3). Idea: Given the current BFS layout (also called the "level graph"), we'll try in O(n^2) time all at once to find the max flow that only uses forward edges. This is sometimes called a "blocking flow". Guarantees that when we take the residual graph, d(t) has gone up by 1 (actually, 2). Define c(v) = capacity of vertex v to be: min[c_in(v), c_out(v)], where c_in(v) is sum of capacities of in-edges and c_out is sum of capacities of out-edges. The Algorithm: - Find vertex v of minimum capacity c. If it's zero, that's great -- we can remove it and incident edges, updating capacities of neighboring nodes. - if c is not zero, then we greedily pull c units of flow from s to v, and then push it along to t. This seems like just another version of the original problem. But, there are two points here: (1) because v had *minimum* capacity, we can do the pulling and pushing in any greedy way and we won't get stuck. E.g., we can do BFS, pushing the flow level-by-level, so we never examine any edge twice. This right away gives us an O(mn) algorithm for saturating the level graph (O(m) per node, n nodes), for an overall running time of O(mn^2). So, we're half-way there (2) To get O(n^2), the way we will do the BFS is to examine the in-edges (or out-edges) one at a time, fully saturating the edge before going on to the next one. This means we can allocate our time into two parts: (a) time spent pushing through edges that get saturated, and (b) time spent on edges that we didn't quite saturate [at most one of these per node]. We only take time O(n) on type-b operations. Type-a operations result in deleting the edge, so over *all* vertices, the *total* time of these is just O(m). So, our running time is dominated by the type (b) operations, which is O(n) per vertex saturated, or O(n^2) total. Once we've saturated the level graph, we recompute it (in O(m) time, but this can be put into the O(n^2)) and re-solve. Total time is O(n^3). MIN-COST MATCHINGS, MIN-COST MAX FLOWS -------------------------------------- We talked about the problem of assigning students to dorm rooms where each student had a list of acceptable versus unacceptable rooms. A natural generalization is to consider what about preferences. E.g, maybe Dan prefers room A so it costs $0 to match him there, his second choice is room B so it costs us $1 to match him here, and he hates room C so it costs $infinity to match him there. And, so on with Sue and Pat. Then we could ask for the MINIMUM-COST PERFECT MATCHING. This is a perfect matching that out of all perfect matchings has the least total cost. Generalization to flows is called the MIN-COST MAX FLOW problem. Here, each edge has a cost w(e) as well as a capacity c(e). Cost of a flow is the sum over all edges of the flow on that edge times the cost of the edge. I.e., cost of flow f = SUM_e w(e)*f(e). Goal is to find, out of all possible maximum flows, the one with the least total cost. Can have negative costs (or benefits) on edges too. - These are more general than plain max flow so can model more things. - Turns out can solve by similar method as we were using for max flow, but where each time you find the least-cost path to the sink: the shortest path where view costs as edge lengths. (Assuming there are no negative-cost cycles. If there *are* negative cost cycles, then the flow will look funny -- you'll have these disconnected loops of flow spinning around. There are ways of solving even that case too, though.) -- Tricky thing is that even if originally all costs were positive, in the residual graph will get negative costs: if it costs $1 to push a unit of flow on edge e, then unpushing it gives you your $1 back. Formally, if we have put f units of flow on an edge (u,v) of cost w, then in the residual graph we have an edge (v,u) of capacity f and cost -w. E.g., do the matching problem. -- So, need to use the Bellman-Ford DP algorithm for finding shortest paths, since Dijkstra's is not guaranteed to work with negative cost edges. Won't give formal proof of correctness that this approach finds the min-cost max flow, but it all hinges on the following: Suppose the graph doesn't have any negative cost cycles. Now, suppose we find the least-cost path from s to t and push flow along it. Then there are still no negative-cost cycles in the residual graph. Easiest to see by picture: Say we find path s->a->b->c->t. If the new graph now has a negative-cost cycle, it has to use at least one of the new backward edges. E.g., c->b->a->x->y->c. But then that means that s->a->x->y->c->t was a cheaper path in the original graph. The reason this proves the result is that if you take the supposedly better max flow f' and subtract our flow f, what you get is a bunch of cycles in our final residual graph. And if f' was really better than f, then these would be negative-cost cycles.