15-451 Algorithms 10/19/00 * Graphs Algorithms II - Dijkstra's alg for shortest paths - Prim's alg for MST - Kruskal's alg for MST ======================================================================= Dynamic Programming is a good approach to try when you're trying to get an exponential-time algorithm down to polynomial-time. (Other good techniques are network flow and linear programming which we'll talk about later). But, DP will usually get you just down to O(n^2) or O(n^3) time. If you need something faster, that's when you start thinking about data structures and algorithms that can take advantage of them. In recitation, we saw this with the longest-increasing-subsequence problem. Today - start by talking again about shortest path problem. Have a "goal" node and want to find shortest path from everywhere to the goal. We saw that Bellman-Ford takes time O(E*V). Today we'll look at a faster alg (Dijkstra's alg) which requires there are no negative-weight edges. Try to see if you can figure out where the argument breaks down if negative-weight edges are present. =============================================================================== Dijkstra's algorithm. Idea is to build up shortest-path tree in greedy way. E.g., 5 8 A-----B-----C | | | 1| 1| |6 goal = A. | 2 | 4 | D-----E-----F DIJKSTRA's ALG -------------- initialize: - tree = {goal}. No edges. goal.dist = 0. invariant: nodes in tree have correct distance to goal. repeat: - for each nbr x of tree compute an (over)-estimate of distance to goal: x.dist = min [ v.dist + length(e) ] edges e=(x,v) v in tree - put x of minimum x.dist into tree, connecting it via the argmin (the edge that caused x.dist to be what it is) [do above example] Claim: even if some of dists are wrong, the *minimum* one is correct. Proof: say x is node of smallest x.dist. What does the *true* shortest path from x look like? The point is that the first edge this path takes must go directly into the tree, since it has to enter the tree sometime, and so any other path would have length at least y.dist from some y!=x, and by definition, x.dist is smaller. Notice that this argument would break down if there was a negative-cost way of getting from x to y. That's why Dijkstra's algorithm doesn't allow negative-cost edges. RUNNING TIME & IMPLEMENTATION: First of all, we should only recompute x.dist when it could possibly have changed. So, a better implementation would look more like this: Init: goal.dist=0, v.dist = infinity for v != goal. Put all nodes in data structure H. Repeat: pull out v of min v.dist from H. Mark as being in the tree. for each unmarked neighbor x of v: set x.dist = min (x.dist, v.dist + length(xv edge)) if new value is smaller, set x.parent = v. Operations done on H: insert: |V| times remove-min: |V| times. decrease-key: |E| times. Linked list: insert is O(1). descrease-key is O(1). remove-min is O(V). Regular heap: all operations are O(log V). Fibonacci heap: insert, decrease-key are O(1), remove-min is O(log V) (amortized, but that's fine for us). So, total time using a fibonacci heap is O(V*logV + E). So, this is linear time for dense graphs (average degree >= log(V)). ============================================================================ Now turn to related problem of MINIMUM SPANNING TREE. A SPANNING TREE is a tree that touches all the nodes of a graph. (So, it only makes sense in a connected graph.) A MST is the shortest spanning tree (size of tree is sum of its edge lengths) For instance, can imagine you're setting up a communication network between these nodes or something like that. Let's go back to the above example graph: 5 8 A-----B-----C | | | 1| 1| |6 | 2 | 4 | D-----E-----F What is the MST here? (total length = 14) Nice algorithm that's a lot like Dijkstra's shortest path alg called Prim's algorithm. Prim's algorithm also grows the tree one edge at a time, beginning at some arbitrary starting node. It's simpler than Dijkstra's, though, because the Algorithm is simply: at each step add on the shortest edge into the tree that doesn't create a cycle. I.e., to grow the tree, put in the node of smallest distance to the *tree* (not to the *root*). Prim: initialize: tree = {root}. No edges. repeat: add in the shortest edge a between tree node and a non-tree node. --> run Prim on above graph Proof is correctness is less obvious, though. Will prove by induction. First: neat fact about spanning trees: If you take a spanning tree and add an edge to it, you get a cycle. If you then remove any edge in the cycle, you get back a spanning tree. Proof of correctness of Prim's alg: * Inductively, assume tree built so far is consistent with an MST (there might be more than one). This is certainly true at the start. * Need to show this is maintained each time we add an edge. Say T is tree so far, and M is MST consistent with it. If new edge is in M, then fine. If new edge is not in M, then edge forms a cycle with M. This edge e goes from T to outside, so if we follow the cycle we must eventually get to an edge e' that goes back in to T. We know length(e') >= length(e) by definition of the algorithm. So, if we add e to M and remove e', we get a new tree that is no longer than M was. This maintains inductive hypothesis. Running time is like Dijkstra: O(E log V) if use regular heap, or O(E + V log V) if use fibonacci heap. ---------------------- Here's another nice algorithm, called Kruskal's algorithm. Algorithm: Sort edges by length and examine them from shortest to longest. Put edge into current forest (forest is set of trees) if it doesn't form a cycle. E.g., look at in this graph: 5 8 A-----B-----C | | | 1| 1| |6 | 2 | 4 | D-----E-----F Proof of correctness: Can basically use same argument as before: assume inductively there is some MST M consistent with forest F found so far. Now we add in a new edge e between components C and C' of our forest. We know that M gets between C and C' somehow. In particular, either M has e in it, or else e forms a cycle with M, and this cycle has some other edge e' in it that connects C to some other component (maybe C' or maybe an intermediate one) in our forest. By definition of our algorithm, len(e) <= len(e'). So either M has e already, or substituting e for e' produces a tree M' that is just as good and maintains our inductive hypothesis. What about running time? Takes O(E log E) to sort. Then, for each edge we need to test if it connects two different components. Turns out there's a nice data structure called "union-find" data structure for doing this. General picture is of this ADT is we maintain a collection {S_1, S_2, ..., S_k} of disjoint sets over some universe. Operations are: MakeSet(x) - creates set containing just x Union(x,y) - replace the set S_x containing x and the set S_y containing y with the single set S_x U S_y. Find(x) - return the unique ID for the set containing x (this is just some representative element of this set). Given this, we begin with MakeSet(v) for all vertices v. When we consider some edge (v,w) we just see if Find(v)==Find(w). To put in the edge(v,w) we do a union. Turns out there's a nice data structure for doing this with "almost constant" amortized time per operation (get into specifics next time). So, total time is O(E log E + E * alpha(E,V)) where alpha(E,V) is "almost constant". This is dominated by the sorting time and is O(E log E). But, if for some reason we have the edges already sorted by length, then get time of O(E * alpha(E,V)), where basically alpha(E,V) <= 4. (to make alpha > 4, would need E > 2^2^2^65536 (actually, a couple more 2's in the tower))