- minimum spanning trees - Bellman-Ford (DP) path finding Handout: old quiz - minimum cost matchings Last time we talked about using Priority-first search to find shortest paths (often called Djikstra's algorithm). It turns out there are a lot of other problems that can be solved using this type of algorithm. Today: discuss one called the Minimum-Spanning-Tree problem Say you have a bunch of houses and want to connect water lines or ethernet to all of them. Want to create a network that connects all of them together using smallest amount of pipe/wire possible. Want: tree spanning these points with smallest total edge length. (Can anyone see why a tree?) This is called a MINIMUM SPANNING TREE. Eg: 5 3 2 A------D----E---H 3| \ 2/ | | 4\ / |4 C---B---G---F 2 4 3 Several possible trees. What's min total cost??? (19) DEFN: Spanning tree: tree that connects all the vertices. Min Spanning TREE (MST): spanning tree such that sum of lengths of edges in tree is smallest over all spanning trees. Here's an algorithm to solve this problem. Start with one node: (Say B) put in shortest edge to B. Now have tree of 2 nodes. Keep growing tree, always adding in the smallest edge out of those with exactly one endpoint in the tree. --> Called "Prim's Algorithm." Why does this work? It's a bit tricky. Another example +------+ 4 7 | TREE |---A---D | | 7 |3 /5 | |---B-' | | 9 |2 +------+---C Let's look at shortest edge between TREE region and FRINGE. Suppose there exists a MST that's consistent with all our decisions so far. Want to show still true after add in edge -> induction. Let's look at this MST. (Draw, include tree so far). If this edge is in the tree, then decision is consistent with tree which is what we wanted. Otherwise, add in this edge. That will make a cycle, a "loop". (Whenever you take a spanning tree, and add in a new edge, get a cycle.) Cycle starts in TREE region, goes along edge, has to end in done region. Look at other edge. It's at least as long. (Assumed this was shortest). So, if add in edge here, and take out edge here, tree only gets smaller or stays same (has to stay same since this was min). -> have another MST consistent with choice. ==> there's always a MST consistent with what we've done so far ==> We find the MST. ---------------------------------------------------------------------- How can we code this up? Let's first write down our shortest path code, and then we can see how to modify it to work for the MST problem For variety, let's do a version (which you might have seen in section yesterday) where we store *edges* on the priority queue. init: insert [(start, start), 0] into Pqueue While queue not empty: [(U, V), priority] = pqueue.removeMin(); // think of as edge U--V if V already visited, continue; mark V as visited. set parent[V] = U. // putting U--V into the tree for each unvisted neighbor X of V: // X is now in fringe insert [(V, X), priority + length(VX edge)] into pqueue. How to change? Change last line to: "insert[(V, X), length(VX edge)] into pqueue" -> always putting in shortest edge to fringe. Run this on the above example. ---------------------------------------------------------------------- What if we wanted to find the shortest path in a graph with negative weight edges? This only makes sense as a problem if the negative weight edges are directed (otherwise you could go back and forth racking up points forever) and more generally if there are no negative-weight cycles. Turns out, the PFS method breaks down here (see picture below). We've actually seen a method before that will work here -- dynamic programming. 30 30 -- picture -- 5-------------------3-----------------4 | ^ | 10 | -20| 15 | | ^ | | | | 2-------------------1-----------------0-- 30 50 Start Let's recap. For simplicity: just find LENGTH OF shortest path. To find actual path, could either reconstruct from lengths or keep "parent" array of where you came from. Idea. Calculate length of shortest path Start -> everywhere using i or fewer edges (or, infinity if can't get there in i edges). Use to calculate for i+1 or fewer, etc. Shortest path uses at most |V|-1 edges. (NO NEG WT CYCLES) Start with 0. later calculate for i+1 based on i. Final result: i=n-1. i=0: 0 inf inf inf inf inf i=1: i=2: General idea: start is always 0. For others, to get to v using i or fewer edges, need to get to an in-neighbor of v using i-1 or fewer edges, right? So, min dist way to get to v using i edges or less: dist(v,i) = MIN ( dist(x,i-1) + length(x->v edge) ) edges x->v i is the number of edges x is the vertex we're looking at. initialize: d[v][0] = inf for v not start, 0 for start. for i=1 to |V|-1: for each v not equal to start: d[v][i] = d[v][i-1] for each edge x->v: d[v][i] = MIN( d[v][i], d[x][i-1] + len(v,x) ) total time is O(|E|*|V|). --> a lot slower than PFS method for big graphs, but has advantage that works when edges have negative weights. --------------------------------------------------------------------------- Might ask: when would you care about neg cost edges? Here's one problem you can solve this way. When I was a freshman, at the start of school there was lottery to see who gets to live in which dorm. Got a week or so to look at dorms and figure out preferences and then would submit preference lists. Then you would get assigned. Say small school. 3 freshmen Bob, Sue, Pat. 3 dorms. A, B, C. Say each dorm has exactly 1 slot (can generalize, but just to illustrate idea) Bob: A=1, B=2. Sue: A=1, C=2., Pat: C=1, B=2. ((all edges directed)) Say we want to match people up to make them the happiest. One criterion: match EVERYBODY, minimize sum of weights in matching. Everyone gets 1st choice --> sum=3, everyone gets second --> sum=6. "Min Cost Matching Problem". Fake start, end. costs 0. Find shortest path. Delete 0-wt edges taken. Reverse and change sign of other edge(s) taken. Bob-->A. Reverse and change sign: think of this way. We've "paid" 1 to do this matching.Think of going backwards as a decision to "Undo" this matching and try something else, so get paid back your 1. Now, what's shortest path? Pat->C. Now update... Now what's shortest path? start->Sue->A->Bob->B. match is: Sue->A, Bob->B, Pat->C. These are called "augmenting paths" Why does this work?