15-451 Algorithms 10/26/04 Network Flow II - bipartite matching - Edmonds-Karp #2 - Dinic & MPM - min-cost max flow ====================================================================== NETWORK FLOW RECAP: Given a directed graph G, a source s, and a sink t. Each edge (u,v) has some capacity c(u,v). Goal is to find the maximum flow possible from s to t. Last time: looked at Ford-Fulkerson algorithm. Used to prove maxflow-mincut theorem, also integral flow theorem. Key idea: Given a flow f so far, we describe the capacities left over in a "residual graph". Specifically, the residual graph has all edges of positive "residual capacity", defined as: c_f(u,v) = c(u,v) - f(u,v) where we define f(v,u) = -f(u,v) FF alg is: find path of positive residual capacity; push flow to saturate; repeat. For example: 4 nodes {s,a,b,t}. s-->a has cap 4, a-->t has cap 2, s-->b of cap 2, b-->t of cap 3, and a-->b of cap 3. Point of this is that any legal flow in the residual graph, when added to f (edge by edge) is a legal flow in the original graph (satisfies capacity constraints; obeys flow in = flow out from each node). And, this is true vice-versa. This means that the maximum flow in the residual graph, plus f, is a maximum flow in the original graph. APPLICATION: MAXIMUM MATCHING ============================= Say we are in charge of assigning dorm rooms to incoming freshmen. Let's say freshmen are allowed to scout out rooms and list as acceptable or unacceptable. Eg., small school with 3 freshmen and 3 available rooms: student slot ------- ---- Dan A Sue B Pat C Edge if acceptable: e.g, (Dan A), (Dan B), (Sue A), (Sue C), (Pat B), (Pat C) This is a BIPARTITE GRAPH: a graph with two sides L and R, and all edges go between L and R. A MATCHING is a set of edges with no endpoints in common. What we want here is a PERFECT MATCHING - a matching that connects every point on the left hand side with some point on the right hand side. What's a perfect matching here? More generally (say there is no perfect matching) we want a MAXIMUM MATCHING - a matching with maximum possible number of edges. Algorithm to solve: ------------------ 1. Set up fake "start" node S connected to all in L. Connect all in R to a fake "end" node T. Orient all edges left-to-right and give each a capacity of 1. 2. Find a max flow from S to T using Ford-Fulkerson. 3. Output matching corresponding to flow. This finds a legal matching because edges from R to T have capacity 1, so the matching can't have two edges into same node, and similarly the edges from S to L have cap=1, so you can't have two edges leaving same node in L. It's maximum because any matching gives you a flow of the same value. (So if there was a better matching, we wouldn't be at a maximum flow). What about the number of iterations of path-finding? This is at most the #edges in the matching since each new augmenting path gives us 1 new edge. [Run alg on above example]. Notice a neat fact: say you start matching Dan->A, Pat->C. These are bad choices. But, the augmenting path automatically undoes them as it improves the flow! Matchings come up as subroutines in a lot of problems. Another example: matching up suppliers to customers. Put customers on RHS. For each supplier, put k nodes on LHS if it can supply k customers. Then link up based on which customers are near to which suppliers (some may be near to multiple suppliers, which is what makes it interesting). IMPROVING THE RUNNING TIME OF NETWORK FLOW ========================================== Assuming capacities are integers, the basic Ford-Fulkerson algorithm could make up to F iterations, where F is the value of the max flow. Each iteration takes O(m) time to find a path (e.g., DFS or BFS) and to compute the residual graph. [m = # edges]. So, total time is O(mF). This is fine if F is small, like in the matching case. Not good if capacities are in binary and F could be very large. Edmonds-Karp #1: push flow along widest path. We saw that this ensures at most O(m*log(F)) iterations. Each iteration involves running a Prim-like algorithm. Takes time O(m log n). Total time is O(m^2 * log(F) * log(n)). [Can get rid of log(n) by being clever] Can we remove dependence on F completely? EDMONDS-KARP #2 --------------- Edmonds-Karp #2: always pick the *shortest* path (rather than the max-capacity path) Claim: this makes at most mn iterations. So, run time is O(nm^2) since we use BFS in each iteration. Proof is pretty neat. Proof: Let d(t) be the distance from s to t in the current residual graph. We'll prove the theorem by showing that every m iterations, d(t) has to go up by at least 1. Let's lay out the graph in levels according to a BFS from s. I.e., nodes at level i are distance i away from s, and t is at level d(t). Now, notice that so long as d(t) doesn't change, the paths found WILL ONLY USE FORWARD EDGES IN THIS LAYOUT. Each iteration saturates (and removes) at least 1 forward edge, and adds only backward edges (so no distance ever drops). This can happen at most m - d(t) < m times since after that many time, t would become disconnected from s. So, after m iterations, either t is disconnected (and d(t) = infinity) or else we must have used a non-forward edge, implying that d(t) has gone up by 1. Since d(t) can increase at most n times, there are at most nm iterations. DINIC & MPM =========== Can we do this a little faster? (this is not in the book, and it's not so crucial you get the details, but it's pretty nice I think.) Previous alg: mn iterations at O(m) each for O(m^2 n) total. Let's see if we can reduce to O(n^3). Idea: Given the current BFS layout (also called the "level graph"), we'll try in O(n^2) time all at once to find the max flow that only uses forward edges. This is sometimes called a "blocking flow". Guarantees that when we take the residual graph, d(t) has gone up by 1. So, at most n iterations. Define c(v) = capacity of vertex v to be: min[c_in(v), c_out(v)], where c_in(v) is sum of capacities of in-edges and c_out is sum of capacities of out-edges. The Algorithm: - In O(m) time, create BFS layout from s, and intersect with backwards BFS from t. Compute capacities of all nodes. - Find vertex v of minimum capacity c. If it's zero, that's great: we can remove v and incident edges, updating capacities of neighboring nodes. - if c is not zero, then we greedily pull c units of flow from s to v, and then push it along to t. This seems like just another version of the original problem. But, there are two points here: (1) because v had *minimum* capacity, we can do the pulling and pushing in any greedy way and we won't get stuck. E.g., we can do BFS forward from v and backward from v, pushing the flow level-by-level, so we never examine any edge twice. [do example] This right away gives us an O(mn) algorithm for saturating the level graph (O(m) per node, n nodes), for an overall running time of O(mn^2). So, we're half-way there. (2) To get O(n^2), the way we will do the BFS is to examine the in-edges (or out-edges) one at a time, fully saturating the edge before going on to the next one. This means we can allocate our time into two parts: (a) time spent pushing through edges that get saturated, and (b) time spent on edges that we didn't quite saturate [at most one of these per node]. We only take time O(n) on type-b operations. Type-a operations result in deleting the edge, so the edge won't be used again in the current iteration. So over *all* vertices, the *total* time of these is just O(m). So, our running time per iteration is O(n^2 + m) = O(m). [O(n^2) for the type-b and O(m) for the type-a] Once we've saturated the level graph, we recompute it (in O(m) time, but this can be put into the O(n^2)) and re-solve. Total time is O(n^3). MIN-COST MATCHINGS, MIN-COST MAX FLOWS -------------------------------------- We talked about the problem of assigning students to dorm rooms where each student had a list of acceptable versus unacceptable rooms. A natural generalization is to ask: what about preferences? E.g, maybe Dan prefers room A so it costs $0 to match him there, his second choice is room B so it costs us $1 to match him here, and he hates room C so it costs $infinity to match him there. And, so on with Sue and Pat. Then we could ask for the MINIMUM-COST PERFECT MATCHING. This is a perfect matching that out of all perfect matchings has the least total cost. Generalization to flows is called the MIN-COST MAX FLOW problem. Here, each edge has a cost w(e) as well as a capacity c(e). Cost of a flow is the sum over all edges of the flow on that edge times the cost of the edge. I.e., cost of flow f = SUM_e w(e)*f(e). Goal is to find, out of all possible maximum flows, the one with the least total cost. Can have negative costs (or benefits) on edges too. - These are more general than plain max flow so can model more things. There are a couple ways to solve this. One way is to add an edge from t to s of infinite capacity and large negative cost, and to convert this into the "MIN-COST CIRCULATION" problem. Basically, by doing this, we now can just have the goal of finding a circulation (a flow that obeys flow-in = flow-out at EVERY node (no s and t anymore)) that has smallest cost (this will be a negative number). This will be the min-cost max-flow in the original graph. Algorithm: find a negative cost cycle and saturate it. Can use Bellman-Ford to find negative cost cycles. Running time will be like Ford-Fulkerson. To speed up, ideally would like the "most-negative-cost cycle" but that's NP-hard. Instead there are various tricks you can use to still do well. E.g., in the Goldberg-Tarjan algorithm, you instead find the cycle that minimizes (cost of cycle)/(# edges in cycle), which is something you *can* solve. [add some value k to all edges and check if there's still a negative-cost cycle using BF. Do binary search to find the value of k s.t. the min-cost cycle has value 0. That's the one that minimizes (cost of cycle)/(# edges in cycle) in the original graph.] Then you prove this has a bound roughly like Edmonds-Karp #1.