15-451 Algorithms 10/10/06 Graph Algorithms II * Dijkstra's alg for shortest paths * Maximum bottleneck path * Minimum Spanning Trees: Prim and Kruskal Midterm in 1 week 1 sheet of notes, no book ========================================================================= Shortest paths ============== Given graph G, start node S, destination T, want to find shortest path from S to T. In the last class, we saw a couple Dynamic-Programming algorithms for shortest path problems. In particular, the Bellman-Ford algorithm solved for the shortest path tree from a start node S in time O(mn). Def: the SHORTEST PATH TREE from a start node S is a tree giving the shortest path from S to every other node in G. Q: assuming all other nodes are reachable from S, why must such a tree exist? That is, why must we be able to describe the shortest paths from S to every other node using just a single tree? A: because if shortest path from S to u goes through v, then it uses a shortest path from S to v. So, for each node u, just need to indicate the very last edge on a shortest path to u. [This is also an "optimal subproblem" property that allows for dynamic programming to work. First alg for today (which you've seen in 15-211) is an algorithm for building a shortest path tree that's faster than Bellman-Ford, called Dijkstra's algorithm. However, it requires that all edge lengths be non-negative. See if you can figure out where proof of correctness of alg requires non-negativity. Example for illustration: 5 8 A-----B-----C | | | 1| 1| |3 start = A. | 2 | 4 | D-----E-----F DIJKSTRA's ALG -------------- initialize: - tree = {start}. No edges. label start as having distance 0. invariant: nodes in tree have correct distance to start. repeat: - for each nbr x of tree compute an (over)-estimate of distance to start: distance(x) = min [ distance(v) + length(e) ] edges e=(v,x) v in tree In other words, by our invariant, this is the length of the shortest path to x whose only edge not in the tree is the very last edge. - put x of minimum distance into tree, connecting it via the argmin (the edge e used to get distance(x)) [do above example] Claim: even if some of distances are wrong, the *minimum* one is correct. Proof: say x is neighbor of tree of smallest distance(x). What does the *true* shortest path to x look like? The key point is that the last edge this path takes must come directly from the tree. Let's argue this by contradiction. Suppose instead the first non-tree vertex on the shortest path to x is some node y!=x, and then the path goes from y to x. Then the length of the path must be at least distance(y), and by definition, distance(x) is smaller (or at least as small if there are ties). [This is where the "non-negativity" comes in. Technically, the wording of our proof was a little sloppy if we allow zero-length edges: think how you would re-word it for that case.] Running time: if you use a heap and code this up right, you can get time of O(m log n). Start with all nodes on heap, everyone having distance of infinity except start having distance of 0. Repeatedly pull off the minimum, then update its neighbors, tentatively assigning parents any time the distance of some node is lowered. Takes linear time to initialize, but then we do m updates, at O(log n) cost each. If you use something called a "fibonacci heap" (that we're not going to talk about) can get time down to O(m + n log n) Maximum-bottleneck path ======================= Here is another problem you can solve with this type of algorithm, called the "maximum bottleneck path" problem. Imagine the edge weights represent capacities of the edges ("widths" rather than "lengths") and you want the path between two nodes whose minimum width is largest. How could you modify Dijkstra to solve this?? Def: width of path = minimum width of edges on the path bottleneck(v) = width of widest path to v. Now, we'd update: bottleneck(x) = max [min(bottleneck(v),width(e))] edges e=(v,x) v in tree - put x of maximum bottleneck into tree, connecting it via the argmax We'll actually use this later in the course... =========================================================================== MINIMUM SPANNING TREE --------------------- A SPANNING TREE is a tree that touches all the nodes of a graph. (So, it only makes sense in a connected graph.) A Minimum Spanning Tree is the shortest spanning tree (size of tree is sum of its edge lengths) For instance, can imagine you're setting up a communication network. For example: 5 8 A-----B-----C | | | 1| 1| |6 | 2 | 4 | D-----E-----F What is the MST here? (Note: our defn is only for undirected graphs). Prim's algorithm ================ Prim's algorithm is the following simple MST algorithm: - pick some arbitrary start node. - add shortest edge to current tree. - continue until tree spans all nodes. --> what does this do on above graph? [So, a lot like Dijsktra, except just put in the shortest edge rather than adding the length to the distance of endpoint to the start node. But even though the alg is simpler than Dijkstra, the analysis is a bit trickier...] Now we need to prove that this works. Will prove by induction. First: useful fact about spanning trees: If you take any spanning tree and add an edge to it, you get a cycle. That's because there was already one path between the endpoints, and now there are two. If you then remove any edge in the cycle, you get back a spanning tree. Proof of correctness of Prim's alg: Proof by contradiction. Say it fails. Consider the first edge "e" chosen by the algorithm that is not consistent with an MST. Let T be the tree we have built just before adding in e, and let M be the MST consistent with T. Now, suppose we add e to the M. This creates a cycle. Since e has one endpoint in T and one outside T, if we trace around this cycle we must eventually get to an edge e' that goes back in to T. We know length(e') >= length(e) by definition of the algorithm. So, if we add e to M and remove e', we get a new tree that is no larger than M was, contradicting our defn of "e". Running time: To make this efficient, we need a good way of storing the edges into our current tree so we can quickly pull off the shortest one. If we use a heap, we can do this in log(n) time. So, the total time is O(m log n) where m is the number of edges and n is the number of vertices. Kruskal's algorithm =================== Here's another nice algorithm, called Kruskal's algorithm. Algorithm: Sort edges by length and examine them from shortest to longest. Put edge into current forest (forest is set of trees) if it doesn't form a cycle. E.g., look at in this graph: 5 8 A-----B-----C | | | 1| 1| |6 | 2 | 4 | D-----E-----F Proof of correctness: Can basically use same argument as before. Let e be the first edge we add that is inconsistent with any MST, and let F be the forest just before adding e. Let M be an MST consistent with F. Put e into M. This makes a cycle. Go around until you take some edge e' jumping between trees in F. By definition of our algorithm, len(e) <= len(e'). So, putting in e and removing e' can only make M better, which is a contradiction to our defn of e. What about running time? Takes O(m log m) to sort. Then, for each edge we need to test if it connects two different components. This seems like it should be a real pain: how can we tell if an edge has both endpoints in the same component. Turns out there's a nice data structure called "union-find" data structure for doing this. So fast that it actually will be low-order compared to the sorting. Simple version of Union-Find: total time O(m + n log n) for our series of operations. More sophisticated version of Union-Find: total time O(m lg^*(n)) -- or even O(m alpha(n)), where alpha(n) is the inverse-Ackermann function that grows even more slowly than lg^*. -> lg^*(n) is the number of times you need to take the lg until you get down to 1. So, lg^*(2) = 1 lg^*(2^2 = 4) = 2 lg^*(2^2^2 = 16) = 3 lg^*(2^2^2^2 = 65536) = 4 lg^*(2^65536) = 5 [I won't define inverse-Ackerman, but to get alpha(n) to 5, you need n to be at least a stack of 256 2's.]