15-451 Algorithms 10/28/03 Network Flow, part II - Recap from last time - Edmonds-Karp #2 - Dinic & MPM - min-cost max flow =========================================================================== NETWORK FLOW RECAP: given a directed graph G, a source s, and a sink t. Each edge (u,v) has some capacity c(u,v). Goal is to find the maximum flow possible from s to t. Last time: looked at Ford-Fulkerson algorithm. Used to prove maxflow-mincut theorem, also integral flow theorem. Key idea: Given a flow f so far, we describe the capacities left over in a "residual graph". Specifically, the residual graph has all edges of positive "residual capacity", defined as: c_f(u,v) = c(u,v) - f(u,v) where we define f(v,u) = -f(u,v) For example: 4 nodes {s,a,b,t}. s-->a has cap 4, a-->t has cap 2, s-->b of cap 2, b-->t of cap 3, and a-->b of cap 3. Start with initial flow of 3 along s,a,b,t. Point of all this is that any legal flow in the residual graph, when added to f (edge by edge) is a legal flow in the original graph (satisfies capacity constraints; obeys flow in = flow out from each node). And, this is true vice-versa. This means that the maximum flow in the residual graph, plus f, is a maximum flow in the original graph. We then saw how to encode bipartite matching as a maxflow problem: create a fake source s, a fake sink t. All edges point forward with capacity 1. Show that any integral flow is a matching of the same value, and any matching can be converted to a flow of the same value. So the maximum flow gives us a maximum matching in the original graph. Then near the end we had a discussion of running time. Each iteration of Ford-Fulkerson takes O(m) time to find a path (e.g., DFS or BFS) and to compute the residual graph. [m = # edges]. Each iteration pushes at least one more unit of flow (assuming all capacities are integers). So, at most O(mF) time total, where F is the value of the maximum flow. Time O(mF) is fine if F is fairly small. E.g., in the case of matchings, we have F <= n. But what if F is a large number? Edmonds-Karp #1: push flow along widest path. We saw that this ensures at most O(m*log(F)) iterations (assuming all capacities are integers). Each iteration involves running a Prim-like algorithm. Takes time O(m log n). Total time is O(m^2 * log(F) * log(n)). Can we remove dependence on F? EDMONDS-KARP #2 --------------- Edmonds-Karp #2: always pick the *shortest* path (rather than the max-capacity path) [Do example on standard cut-diamond graph, but where add extra nodes to bottom-right and top-left edges to lengthen them.] Claim: this makes at most mn iterations. So, run time is O(nm^2) since we use BFS in each iteration. Proof: Let d(t) be the distance froms to t in the current residual graph. We'll prove the theorem by showing that every m iterations, d(t) has to go up by at least 1. Let's lay out the graph in levels according to a BFS from s. I.e., nodes at level i are distance i away from s, and t is at level d(t).. Now, notice that so long as d(t) doesn't change, the paths found WILL ONLY USE FORWARD EDGES IN THIS LAYOUT. Each iteration saturates (and removes) at least 1 forward edge, and adds only backward edges (so none of the distances ever drops). This can happen at most m-d(t) < m times since after that many time, t would become disconnected from s. So, after m iterations, either t is disconnected (and d(t) = infinity) or else we must have used a non-forward edge, implying that d(t) has gone up by 1. Since d(t) can increase at most n times, there are at most nm iterations. DINIC & MPM =========== (this is not in the book, and it's not so crucial you get the details, but it's kind of neat I think.) Can we do this a little faster? Previous alg: mn iterations at O(m) each for O(m^2 n) total. Let's see if we can reduce to O(n^3). Idea: Given the current BFS layout (also called the "level graph"), we'll try in O(n^2) time all at once to find the max flow that only uses forward edges. This is sometimes called a "blocking flow". Guarantees that when we take the residual graph, d(t) has gone up by 1. So, at most n iterations. Define c(v) = capacity of vertex v to be: min[c_in(v), c_out(v)], where c_in(v) is sum of capacities of in-edges and c_out is sum of capacities of out-edges. The Algorithm: - Find vertex v of minimum capacity c. If it's zero, that's great -- we can remove v and incident edges, updating capacities of neighboring nodes. This step takes O(m) time. - if c is not zero, then we greedily pull c units of flow from s to v, and then push it along to t. This seems like just another version of the original problem. But, there are two points here: (1) because v had *minimum* capacity, we can do the pulling and pushing in any greedy way and we won't get stuck. E.g., we can do BFS forward from v and backward from v, pushing the flow level-by-level, so we never examine any edge twice. [do example] This right away gives us an O(mn) algorithm for saturating the level graph (O(m) per node, n nodes), for an overall running time of O(mn^2). So, we're half-way there. (2) To get O(n^2), the way we will do the BFS is to examine the in-edges (or out-edges) one at a time, fully saturating the edge before going on to the next one. This means we can allocate our time into two parts: (a) time spent pushing through edges that get saturated, and (b) time spent on edges that we didn't quite saturate [at most one of these per node]. We only take time O(n) on type-b operations. Type-a operations result in deleting the edge, so the edge won't be used again in the current iteration. So over *all* vertices, the *total* time of these is just O(m). So, our running time per iteration is O(n^2 + m) = O(m). [O(n^2) for the type-b and O(m) for the type-a] Once we've saturated the level graph, we recompute it (in O(m) time, but this can be put into the O(n^2)) and re-solve. Total time is O(n^3). MIN-COST MATCHINGS, MIN-COST MAX FLOWS -------------------------------------- We talked about the problem of assigning students to dorm rooms where each student had a list of acceptable versus unacceptable rooms. A natural generalization is to consider what about preferences. E.g, maybe Dan prefers room A so it costs $0 to match him there, his second choice is room B so it costs us $1 to match him here, and he hates room C so it costs $infinity to match him there. And, so on with Sue and Pat. Then we could ask for the MINIMUM-COST PERFECT MATCHING. This is a perfect matching that out of all perfect matchings has the least total cost. Generalization to flows is called the MIN-COST MAX FLOW problem. Here, each edge has a cost w(e) as well as a capacity c(e). Cost of a flow is the sum over all edges of the flow on that edge times the cost of the edge. I.e., cost of flow f = SUM_e w(e)*f(e). Goal is to find, out of all possible maximum flows, the one with the least total cost. Can have negative costs (or benefits) on edges too. - These are more general than plain max flow so can model more things. There are a couple ways to solve this. One way is to add an edge from t to s of infinite capacity and large negative cost, and to convert this into the "MIN-COST CIRCULATION" problem. Basically, by doing this, we now can just have the goal of finding a circulation (a flow that obeys flow-in = flow-out at EVERY node (no s and t anymore)) that has smallest cost (this will be a negative number). This will be the min-cost max-flow in the original graph. Algorithm: find a negative cost cycle and saturate it. Can use Bellman-Ford to find negative cost cycles. Running time will be like Ford-Fulkerson. To speed up, ideally would like the "most-negative-cost cycle" but that's NP-hard. Instead there are various tricks you can use to still do well. E.g., in the Goldberg-Tarjan algorithm, you instead find the cycle that minimizes (cost of cycle)/(# edges in cycle), which is something you *can* solve. [add some value k to all edges and check if there's still a negative-cost cycle using BF. Do binary search to find the value of k s.t. the min-cost cycle has value 0. That's the one that minimizes (cost of cycle)/(# edges in cycle) in the original graph.] Then you prove this has a bound roughly like Edmonds-Karp #1.