15-451 Algorithms 10/12/06 * Maximum bottleneck path * Minimum Spanning Tree recap: Prim and Kruskal Midterm on Tues * Midterm review 1 sheet of notes, no book ========================================================================= One nice thing about Dijkstra's algorithm is that you can modify it to solve other related problems. (This kind of thing often makes a good test question) DIJKSTRA's ALG recap (finds shortest path tree) ----------------------------------------------- E.g., 5 8 A-----B-----C | | | 1| 1| |3 start = A. | 2 | 4 | D-----E-----F Initialize: - tree = {start}. No edges. label start as having distance 0. Invariant: nodes in tree have correct distance to start. Repeat: - for each nbr x of tree compute an (over)-estimate: dist(x) = min [ dist(v) + length(e) ] edges e=(v,x) v in tree In other words, by our invariant, this is the length of the shortest path to x whose only edge not in the tree is the very last edge. - put x of minimum distance into tree, connecting it via the argmin (the edge e used to get dist(x)) Actual implementation: just update dists of neighbors of the node put onto tree, since they are the only ones that can change. Maximum-bottleneck path ======================= Now consider the following problem called the "maximum bottleneck path" problem. Imagine the edge weights represent capacities of the edges (like "widths" rather than "lengths") and you want the path between two nodes whose minimum capacity is largest. How can we solve this? Def: capacity of path = minimum capacity of edges on the path cap(v) = capacity of max capacity path to v. Now, we'd update: cap(x) = max [min(cap(v),w(e))] edges e=(v,x) v in tree - put x of maximum cap(x) into tree, connecting it via the argmax We'll actually use this later in the course... =========================================================================== MINIMUM SPANNING TREE --------------------- A SPANNING TREE is a tree that touches all the nodes of a graph. (So, it only makes sense in a connected graph.) A Minimum Spanning Tree is the shortest spanning tree (size of tree is sum of its edge lengths) Prim's algorithm ================ Prim's algorithm is the following simple MST algorithm: - pick some arbitrary start node. - add shortest edge to current tree. - continue until tree spans all nodes. For example: 5 8 A-----B-----C | | | 1| 1| |6 | 2 | 4 | D-----E-----F [So, a lot like Dijsktra, except just put in the shortest edge rather than adding the length to the distance of endpoint to the start node. But even though the alg is simpler than Dijkstra, the analysis is a bit trickier...] Now we need to prove that this works. Will prove by induction. Assume inductively that there exists an MST M that contains our tree T built so far. Want to show there exists an MST that contains T *and* the new edge e we are about to add. If M contains e then we're done. If not, then add e to M. This creates a cycle. Since e has one endpoint in T and one outside T, if we trace around this cycle we must eventually get to an edge e' that goes back in to T. We know length(e') >= length(e) by definition of the algorithm. So, if we add e to M and remove e', we get a new tree M' that is no larger than M was, and contains both T and e. Running time: To make this efficient, we need a good way of storing the edges into our current tree so we can quickly pull off the shortest one. If we use a heap, we can do this in log(n) time. So, the total time is O(m log n) where m is the number of edges and n is the number of vertices. Kruskal's algorithm =================== Here's another nice algorithm, called Kruskal's algorithm. Algorithm: Sort edges by length and examine them from shortest to longest. Put edge into current forest (forest is set of trees) if it doesn't form a cycle. E.g., look at in this graph: 5 8 A-----B-----C | | | 1| 1| |6 | 2 | 4 | D-----E-----F Proof of correctness: Can basically use same argument as before. Assume by induction there exists a MST M containing all edges in our forest F built so far. Want to show there exists an MST M' containing F and the next edge e we are about to include. If M contains e then we are done. Else put e into M. This makes a cycle. Go around until you take some edge e' jumping between trees in F. By definition of our algorithm, len(e) <= len(e'). So, putting in e and removing e' to get M' can only make M better. What about running time? Takes O(m log m) to sort. Then, for each edge we need to test if it connects two different components. This seems like it should be a real pain: how can we tell if an edge has both endpoints in the same component. Turns out there's a nice data structure called "union-find" data structure for doing this. So fast that it actually will be low-order compared to the sorting. Simple version of Union-Find: total time O(m + n log n) for our series of operations. More sophisticated version of Union-Find: total time O(m lg^*(n)) -- or even O(m alpha(n)), where alpha(n) is the inverse-Ackermann function that grows even more slowly than lg^*. -> lg^*(n) is the number of times you need to take the lg until you get down to 1. So, lg^*(2) = 1 lg^*(2^2 = 4) = 2 lg^*(2^2^2 = 16) = 3 lg^*(2^2^2^2 = 65536) = 4 lg^*(2^65536) = 5 [I won't define inverse-Ackerman, but to get alpha(n) to 5, you need n to be at least a stack of 256 2's.] Simple UNION-FIND O(m + n log n)-time. ================= General picture is we maintain a collection {S_1, S_2, ..., S_k} of components, initially each with one element. Operations: Union(x,y) - replace the component S_x containing x and the S_y containing y with the single component S_x U S_y. Find(x) - return the unique ID for the component containing x (this is just some representative element in it). Given this, when we consider some edge (v,w) we just see if Find(v)==Find(w). To put in the edge(v,w) we do a union. Algorithm --------- Each set will be just a linked list: each element has pointer to next element in list. But, augment so that each element also has a pointer directly to head of list. Head of list is the representative element. Initialization: for each x, let x->head = x. Total time O(n). Find(x) - constant time: return(x->head). We do this O(m) times, so total time O(m). Union(x,y) - to union the two lists (let's call them A and B with lengths L_A and L_B). We append B onto end of A. What is is the time needed? Answer: we first have to walk down A to get to bottom element. Then hook up head of B to this. Then walk down B updating the pointers to the head. So total time is O(L_A + L_B) Q: Can we get just O(L_B)? Method 1: splice B into the middle of A, at x. Method 2: if store pointer to tail in the head, this also avoids walking down A. Q: Can we reduce this to O(min(L_A,L_B))? A: Yes. Just store length of list in the head. Then compare and append shorter one onto end of longer one. Then update the length count. Claim: this gives us the O(m + n log n) total running time. PROOF: We already analyzed the initialization and find operations. Each union costs O(length of list we walk down). Key idea: let's pay for this by charging O(1) to each element that we walked over. Think of as a charge for updating its head pointer. Now, let's look at this from the point of view of some lowly element x sitting in a list somewhere. Over time, how many times does x get stepped on and have its head pointer updated? Answer: at most O(log n) times. Reason: because every time x gets stepped on, the size of the list he is in doubles (at least). So, we were able to pay for unions by charging elements stepped on, and no element gets charged more than O(log n), so total cost is O(n log n) for unions, or O(m + n log n) overall. ====================================================================== MIDTERM REVIEW ============== - Last year's midterm now up on webpage. - Main topics: - recurrences - probabilistic reasoning - reasoning about upper and lower bounds - binary search trees - hashing - Dynamic programming - MST and shortest paths