15-451 Algorithms 10/31/06 Network Flow II - Edmonds-Karp #1 - Edmonds-Karp #2 - further improvements... - min-cost max flow ====================================================================== NETWORK FLOW RECAP: Given a directed graph G, a source s, and a sink t. Each edge (u,v) has some capacity c(u,v). Goal is to find the maximum flow possible from s to t. Example: 4 nodes {s,a,b,t}. s-->a has cap 4, a-->t has cap 2, s-->b of cap 2, b-->t of cap 3, and a-->b of cap 3. (all non-edges can be viewed as edges of capacity 0). Last time: looked at Ford-Fulkerson algorithm. Used to prove maxflow-mincut theorem, also integral flow theorem. Key idea: Given a flow f so far, we describe the capacities left over in a "residual graph". Specifically, the residual graph has all edges of positive "residual capacity", defined as: c_f(u,v) = c(u,v) - f(u,v) where we define f(v,u) = -f(u,v) FF alg is: find path of positive residual capacity; push flow to saturate; repeat. Point of this is that any legal flow in the residual graph, when added to f (edge by edge) is a legal flow in the original graph (satisfies capacity constraints; obeys flow in = flow out from each node). And, this is true vice-versa. This means that the maximum flow in the residual graph, plus f, is a maximum flow in the original graph. [run algorithm on example. Draw flow using flow/cap notation in orig graph] IMPROVING THE RUNNING TIME OF NETWORK FLOW ========================================== Assuming capacities are integers, the basic Ford-Fulkerson algorithm could make up to F iterations, where F is the value of the max flow. Each iteration takes O(m) time to find a path (e.g., DFS or BFS) and to compute the residual graph. So, total time is O(mF). This is fine if F is small, like in the matching case. Not good if capacities are in binary and F could be very large. [Note: let's assume graph has been pre-processed to delete any disconnected parts. So, m >= n-1, and can say DFS runs in time O(m) instead of O(m+n)]. Edmonds-Karp #1 --------------- Idea: don't just choose an arbitrary path. Instead, pick the path of largest (or approximately largest) capacity. Two versions: (1.1) use maximum bottleneck path (path with largest residual capacity). We saw how to do this using a Prim-like algorithm in time O(m log n). (1.2) Define a "capacity threshold" c, and then just try to find an s-t path of residual capacity >= c. Advantage of this: can do in O(m) time. Just do DFS over edges of residual capacity >= c. How to set c? Can start at max capacity over all edges, and if that fails you then divide by 2. This guarantees you are within a factor of 2 of max capacity path. You waste some time with the unsuccessful runs, but you have at most O(log C) of them in total (C = max edge capacity). Can reduce to O(log F) using doubling idea from midterm. Claim: Either way, this causes FF to make at most O(m log F) iterations. So total time using version 1.2 is O(m^2 log F). Proof of claim: First, if the current residual graph has max flow f, then if we drop all edges of res cap < f/m, there must still exist a path from s to t (why? because let A be the component containing s. If doesn't have t, then since there at most m edges out of A, this would be a cut < f). So, max bottleneck path has capacity at least f/m, and (if using version 1.2) this means c is at least f/(2m). Let's just focus on version 1.2 (since then 1.1 follows directly). We have c is at least f/(2m). So can have at *most* 2m iterations before c gets cut down by factor of 2. Since c <= F at the start, we can cut down c at most log(F) times, so total number of iterations is at most 2m*log(F). Can we remove dependence on F completely? EDMONDS-KARP #2 --------------- Edmonds-Karp #2: always pick the *shortest* path (rather than the max-capacity path) Claim: this makes at most mn iterations. So, run time is O(nm^2) since we use BFS in each iteration. Proof is pretty neat. Proof: Let d(t) be the distance from s to t in the current residual graph. We'll prove the theorem by showing that every m iterations, d(t) has to go up by at least 1. Let's lay out the graph in levels according to a BFS from s. I.e., nodes at level i are distance i away from s, and t is at level d(t). Now, notice that so long as d(t) doesn't change, the paths found WILL ONLY USE FORWARD EDGES IN THIS LAYOUT. Each iteration saturates (and removes) at least 1 forward edge, and adds only backward edges (so no distance ever drops). This can happen at most m - d(t)+1 <= m times since after that many times, t would become disconnected from s. So, after m iterations, either t is disconnected (and d(t) = infinity) or else we must have used a non-forward edge, implying that d(t) has gone up by 1. Since d(t) can increase at most n times, there are at most nm iterations. DINIC & MPM =========== Can we do this a little faster? (this part won't be on any tests and it's not crucial you get the details, but the idea is pretty nice.) Previous alg: mn iterations at O(m) each for O(m^2 n) total. Let's see if we can reduce to O(mn^2) and finally to O(n^3). Idea: Given the current BFS layout (also called the "level graph"), we'll try in O(n^2) time all at once to find the max flow that only uses forward edges. This is sometimes called a "blocking flow". Guarantees that when we take the residual graph, d(t) has gone up by 1. So, <= n iterations. Define c(v) = capacity of vertex v to be: min[c_in(v), c_out(v)], where c_in(v) is sum of capacities of in-edges and c_out is sum of capacities of out-edges. The Algorithm: - In O(m) time, create BFS layout from s, and intersect with backwards BFS from t. Compute capacities of all nodes. - Find vertex v of minimum capacity c. If it's zero, that's great: we can remove v and incident edges, updating capacities of neighboring nodes. - if c is not zero, then we greedily pull c units of flow from s to v, and then push it along to t. This seems like just another version of the original problem. But, there are two points here: (1) because v had *minimum* capacity, we can do the pulling and pushing in any greedy way and we won't get stuck. E.g., we can do BFS forward from v and backward from v, pushing the flow level-by-level, so we never examine any edge twice. [do example] This right away gives us an O(mn) algorithm for saturating the level graph (O(m) per node, n nodes), for an overall running time of O(mn^2). So, we're half-way there. (2) To get O(n^2), the way we will do the BFS is to examine the in-edges (or out-edges) one at a time, fully saturating the edge before going on to the next one. This means we can allocate our time into two parts: (a) time spent pushing through edges that get saturated, and (b) time spent on edges that we didn't quite saturate [at most one of these per node]. We only take time O(n) on type-b operations. Type-a operations result in deleting the edge, so the edge won't be used again in the current level graph. So over *all* vertices, the *total* time of these is just O(m). So, our running time per iteration is O(n^2 + m) = O(m). [O(n^2) for the type-b and O(m) for the type-a] Once we've saturated the level graph, we recompute it (in O(m) time, but this can be put into the O(n^2)) and re-solve. Total time is O(n^3). MIN-COST MATCHINGS, MIN-COST MAX FLOWS -------------------------------------- We talked about the problem of assigning groups to time-slots where each group had a list of acceptable versus unacceptable slots. A natural generalization is to ask: what about preferences? E.g, maybe group A prefers slot 1 so it costs $0 to match to there, their second choice is slot 2 so it costs us $1 to match the group here, and it can't make slot 3 so it costs $infinity to match to there. And, so on with the other groups. Then we could ask for the MINIMUM-COST PERFECT MATCHING. This is a perfect matching that out of all perfect matchings has the least total cost. Generalization to flows is called the MIN-COST MAX FLOW problem. Here, each edge has a cost w(e) as well as a capacity c(e). Cost of a flow is the sum over all edges of the flow on that edge times the cost of the edge. I.e., cost of flow f = SUM_e w(e)*f(e). Goal is to find, out of all possible maximum flows, the one with the least total cost. Can have negative costs (or benefits) on edges too. - These are more general than plain max flow so can model more things. There are a couple ways to solve this. One way is to add an edge from t to s of infinite capacity and large negative cost, and to convert this into the "MIN-COST CIRCULATION" problem. Basically, by doing this, we now can just have the goal of finding a circulation (a flow that obeys flow-in = flow-out at EVERY node (no s and t anymore)) that has smallest cost (this will be a negative number). This will be the min-cost max-flow in the original graph. Algorithm: find a negative cost cycle and saturate it. Can use Bellman-Ford to find negative cost cycles. Running time will be like Ford-Fulkerson. To speed up, ideally would like the "most-negative-cost cycle" but that's NP-hard. Instead there are various tricks you can use to still do well. E.g., in the Goldberg-Tarjan algorithm, you instead find the cycle that minimizes (cost of cycle)/(# edges in cycle), which is something you *can* solve. [add some value k to all edges and check if there's still a negative-cost cycle using BF. Do binary search to find the value of k s.t. the min-cost cycle has value 0. That's the one that minimizes (cost of cycle)/(# edges in cycle) in the original graph.] Then you prove this has a bound roughly like Edmonds-Karp #1.