15-451 Algorithms 10/26/00 - max flow - bipartite matching ====================================================================== Today and next time we are going to talk about an important algorithmic problem: network flow and the related matching problem. These are important because they can be used to express and solve a lot of different things. In O.R. they have entire courses devoted to the topic. NETWORK FLOW ------------ - directed graph. Source node. Sink node. Each (directed) edge has a capacity. (Let's say these are integers.) Goal is to flow as much as possible from source to sink. E.g., s-->a, cap = 4. a-->c, cap=3. c-->t, cap = 2. s-->b, cap = 2. b-->d, cap=3. d-->t, cap = 4. c-->b, cap = 1. b-->c, cap=2. E.g., want to route message traffic from source to sink, and the capacities tell me how much bandwidth I'm allowed on each edge. Formally, rules are: Capacity constraint: on any edge, f(u,v) <= c(u,v) Flow conservation: for any vertex except s and t, flow in = flow out. What is the max flow here? Answer: 5. How can you see that above max flow was really maximum? Notice, this flow saturates the a-->c and s-->b edges. And, if you remove these, you disconnect s from t. In other words, the graph has a "s-t cut" of size 5. (a set of edges of total capacity 5 that if you remove disconnects the source from the sink). So, the point is that any unit of water going from s to t must take up at least 1 unit of capacity in these pipes. So, we know we're optimal. We just argued that in general, max flow <= min cut. Important definition: - an s-t cut is a set of edges whose removal disconnects s from t. Or, formally, a cut is a partition of vertex set into A and B where s is in A and t is in B. (The edges of the cut are then all edges going from A to B). - capacity of cut is the sum of capacities of edges in the cut. Or, in the formal viewpoint, it is the sum of capacities of all edges going from A to B. - Easy fact: max s-t flow <= min s-t cut. This is just because any unit of water flowing from s to t must travel along at least one of these edges. How to find a maximum flow and prove it's correct? Here's a very natural strategy: find a path from s to t and push as much flow on it as possible. Then look at the leftover graph and repeat until there is no longer any path with capacity left to push any more flow on. This is called the Ford-Fulkerson algorithm. Basic Ford-Fulkerson algorithm ------------------------------ While there exists an s-->t path P of positive residual (leftover) capacity: push maximum possible flow along P (saturating at least one edge on it) [These are called "augmenting paths"] Definition: "Residual capacity" c_f(u,v) = c(u,v) - f(u,v) where we make the convention that f(v,u) = -f(u,v). --> Example on above graph: start with path s-->b-->c-->t. Use "[capacity/flow]" notation. Note: residual capacity can be *larger* than original capacity if we had flow going in opposite direction. E.g., residual capacity on the c-->b edge is now 3. To keep track of what's going on, it helps to write the "RESIDUAL GRAPH". This graph shows the residual capacities: it has all edges of positive residual capacity, each one labeled by its residual capacity. Other good reason to write down residual graph? If we have a black box path-finding routine, this is what we want to feed into it at each iteration of the Ford-Fulkerson algorithm. --> Note: running time is not necessarily so great - it depends on the choices we make. E.g., standard bad-choice graph. In the next class we'll talk about an algorithm that fixes this problem. Algorithm by design finds a legal flow. Why is it maximum? Proof: Let's look at the final residual graph. This graph must have s and t disconnected by definition of the algorithm. Let A be component containing s and B = rest. Now, let C = capacity of the (A,B) cut in the *original* graph --- so we know we can't do better than C. The claim is that we in fact *did* find a flow of value C (which therefore implies it is maximum). Here's why: let's look at what happens to the residual capacity of the cut after each iteration of the algorithm. Say in some iteration we found a path with k units of flow. Then, even if the path zig-zagged between A and B, every time we went A-->B we added k to flow from A to B (and subtracted k from the residual capacity) and every time we went B-->A we took away k from this flow (and added k to the residual capacity). And, we went from A-->B exactly one more time than we went from B-->A. So, the residual capacity went down by exactly k. So, the drop in capacity is equal to the increase in flow. Since at the end the residual capacity is zero (remember how we defined A and B) this means the total flow is equal to C. Note, we've actually proven a nonobvious theorem here called the "MAX FLOW / MIN CUT THEOREM" In any graph, max flow from s to t = capacity of minimum (s,t)-cut. We started with saying max flow <= min cut. But now we've argued that this algorithm finds a flow of value equal to SOME cut, so it can't be less than the MINIMUM. (This is actually very similar to min/max theorem for 2-player zero sum games.) We have also proven the INTEGRAL-FLOW THEOREM: if all capacities are integers, then there is a max flow in which all flows are integers. Bipartite Maximum matching problem ---------------------------------- Say we are in charge of assigning dorm rooms to incoming freshmen. Let's say freshmen are allowed to scout out rooms and list as acceptable or unacceptable. Eg., small school with 3 freshmen and 3 available slots: student slot ------- ---- Dan A Sue B Pat C Edges: (Dan A), (Dan B), (Sue A), (Sue C), (Pat B), (Pat C) Edge if acceptable. This is a BIPARTITE GRAPH: a graph with two sides L and R, and all edges go between L and R. A MATCHING is a set of edges with no endpoints in common. What we want here is a PERFECT MATCHING - a matching that connects every point on the left hand side with some point on the right hand side. What's a perfect matching here? More generally (say there is no perfect matching) we want a MAXIMUM MATCHING - a matching with maximum possible number of edges. Algorithm to solve: ------------------ 1. Set up fake "start" node S connected to all in L. Connect all in R to a fake "end" node T. Orient all edges left-to-right and give each a capacity of 1. 2. Find a max flow from S to T using Ford-Fulkerson. 3. Output matching corresponding to flow. [Do example of above graph. Notice that each new augmenting path increases the size of the matching by 1 (even if it ends up "undoing" some of our previous choices).] Claim 1: integer flow gives a matching of same value. That's because the edges out of S and into T all have capacity 1. Claim 2: If there *exists* a matching of size k then there exists a flow of k units too. Reason: just hook up edges in matching. So, we get the maximum matching. What about the number of iterations of path-finding? This is at most the #edges in the matching since each new augmenting path gives us 1 new edge. Matchings come up as subroutines in a lot of problems. Another example: matching up factories to stores. Put stores on LHS. For each factory, put k nodes on RHS if it can service k stores. Then link up based on which stores are near to which factories (some stores may be near to multiple factories, which is what makes it interesting). Goal is to match up stores to factories so no factory goes over capacity.