15-750 Grad Algorithms 02/01/01 * Union-find - list-based method - forest-based method ============================================================================= Today's topic is a data structure for the union-find problem. Here's the problem: We want to maintain a collection of disjoint sets with operations: MakeSet(x) - creates set containing just x. Returns ptr p_x Union(p_x,p_y) - union together the set x is in with the set y is in (replacing those two sets with their union). Find(p_x) - return the unique ID for the set containing x (this is just some representative element of this set). [From now on, will just say "x" instead of "p_x", etc] E.g., Kruskal's algorithm: - Do MakeSet(v) for each node v. - for each edge (x,y) in order from shortest to longest: if find(x) != find(y), put (x,y) into tree and do Union(x,y). Notation: n = # of make-set operations, (so at most n-1 unions) m = # of total operations. This matches n= # vertices, m = # of edges pretty well, since we do 2 finds for each edge. Algorithm 1 (list-based): total cost will be O(m + n log n) ----------------------------------------------------------- Each set will be just a linked list: each element has pointer to next element in list. But, augment so that each element also has a pointer directly to head of list. Head of list is the representative element. Makeset(x) - constant time: just create list of 1 element and return ptr. Find(x) - also constant time: just follow pointer to head of list. Union(x,y) - to union the two lists (let's call them A and B with lengths L_A and L_B). We append B onto end of A. What is is the time needed? Answer: we first have to walk down A to get to bottom element. Then hook up head of B to this. Then walk down B updating the pointers to the head. So total time is O(L_A + L_B) Q: Can we get just O(L_B)? Method 1: splice B into the middle of A, at x. Method 2: if store pointer to tail in the head, this also avoids walking down A. Q: Can we reduce this to O(min(L_A,L_B))? A: Yes. Just store length of list in the head. Then compare and append shorter one onto end of longer one. Then update the length count. Claim: this gives us the O(m + n log n) total running time. Proof: The only part we need to worry about is the "walking down the shorter list" part of the union operation. All the rest is constant time per op, so goes into the O(m). So, let's look at this from the point of view of some lowly element x sitting in a list somewhere. Over time, how many times does x get stepped on as we walk down the list and update pointers? Answer: at most O(log n) times. Reason: because every time x gets stepped on, the size of the list he is in doubles (at least). So, if we look at the total time for m operations, we spend O(m) in the constant-time portion, and then we can charge off the walking-down portion to the elements walked over. There are n elements, and for each one the total cost is at most log(n). So, overall, the total is O(m + n log n) ...This is a neat proof I think. A good time for questions... Notice that in Kruskal's algorithm, it's the sorting time that dominates since that was O(m log m). Algorithm 2: ------------ Say we did the above in a lazy way. Just point head of B to head of A. Now, a find might become more expensive because we have to take multiple hops. So, as we do the find, let's point every element we traverse directly to the root (basically doing Algorithm 1 lazily). We'll prove that this has a very nice amortized performance. Actually, we'll need to make one technical change: instead of deciding which of two roots should be the new one based on the SIZE of the set, we'll do it based on which has larger RANK, which we'll define in a minute. Also, notice that we no longer need the downward pointers: what we have in general is a collection of trees, with all links pointing UP. Here's the algorithm more formally: - MakeSet(x) takes constant time: just set x->parent = x. - To do a find(x), we walk up tree to the root, updating x and all the nodes we pass over to point to the root. This is called PATH COMPRESSION. E.g., can imagine implementing like this: If x is root, return x. Else, x->parent = find(x->parent). Return x->parent. Time is proportional to (original) distance of x to root. - union(x,y) = link(find(x), find(y)). Link(root1,root2) makes the root of smaller RANK point to the other (this is called UNION-BY-RANK). RANK defined by: - When you do a makeset(x), x gets rank = 0. - When you link a root of rank r1 to a root of rank r2, then if r1 == r2, you add 1 to the rank of the new root. Otherwise you don't. [E.g., rank = height if we never did path compression] Properties of ranks: 1. Rank(x) can only change if x is a root. Once x becomes a non-root, then its rank never changes again. 2. If x isn't root, then rank(x) is strictly less than rank(x->parent). Over time, rank(x->parent) will never go down (but can increase either because parent changes with path compression or rank changes with link). 3. rank(x) <= n-1, and in fact rank(x) <= log(n) [prove by induction] though we won't need that. We will prove a running time of O(m alpha(n)), where alpha(n) is the inverse-Ackermann function. The proof below follows some notes of Tarjan. It is also similar to what is in the book, so you can look at that too. Note: some of the definitions are slightly different... Ackermann's function is defined like this: A_0(x) = x+1 A_i(x) = A_{i-1}(A_{i-1}(A_{i-1}...(x))), where there are x+1 applications of A_{i-1}. So, for instance, A_1(x) = 2x+1. A_2(x) = A_1(A_1(...(x))) > 2^{x+1}. A_3(x) = A_2(A_2(...(x))) > "a stack of x+1 2's, with x at the top" alpha(n) = smallest value of k such that A_k(1) >= n. E.g., if alpha(n)=5, that means that n > A_4(1) = A_3(A_3(1)) = A_3(A_2(A_1(A_0(A_0(1))))) = A_3(A_2(7)) > A_3(256) > "stack of 257 2's with 256 at the top". [So, basically alpha(n) is never bigger than 4] ANALYSIS: * If x is not a root AND its rank is > 0, then define l(x) = level(x) = largest L such that rank(parent(x)) >= A_L(rank(x)) i.e., if you are in a high level, then your parent has a *much* higher rank than you. Can now verify that 0 <= level(x) < alpha(n). * If x is not a root AND its rank is > 0, then define i(x) = index(x) = largest i s.t. rank(parent(x)) >= A_{l(x)}(A_{l(x)}(...(rank(x)))), (where we apply A_{l(x)} i times). in other words, the index tells you at a slightly finer grain where you live in your level. Can verify that 1 <= i(x) <= rank(x). Now we define a potential function on nodes like this: phi(x) = 0 if rank(x) = 0. phi(x) = alpha(n)*rank(x) if x is a root. phi(x) = (alpha(n) - level(x))*rank(x) - index(x) otherwise. Easy to check that phi(x) >= 0. Total potential is sum of node potentials. Amortized cost = actual cost + change in potential. How to think about potentials: Initially, a node x will be a root with potential 0. Its potential will then go up as its rank increases. But, once x becomes a non-root, then its rank is fixed and its potential will then start going down. In particular, notice that because index is between 1 and rank(x), any time x's level or index changes, its potential will drop by at least 1. Claim 1: amortized cost of link(r1,r2) is <= alpha(n) + 1. Claim 2: amortized cost of find(z) is <= alpha(n) + 2. Proof of claim 1: This is the easier of the two. The actual cost of a link is 1. But we need to look at the change in potential. Luckily, the only node whose potential can possibly go up is the new root, and its potential goes up by at most alpha(n) (since its rank goes up by at most 1). So, the amortized cost of the link is at most alpha(n)+1. Proof of claim 2: Say there are t nodes on the path from z to the root. So, the actual cost is t. We will show that at least t-alpha(n)-2 of these nodes go down in potential. This will imply that the amortized cost is at most alpha(n)+2. First of all, we won't worry about z or the root (that's the -2). Second, we won't worry about nodes x such that all of x's ancestors have a lower level than x does. There are at most alpha(n) of these since there are at most alpha(n) different levels. So, let x be one of the remaining nodes on the path. We know rank(x)>=1 and we know that some ancestor y of x has level(y) >= level(x). This means that before the path compression, we had: rank(parent(y)) >= A_{l(y)}(rank(y)) >= A_{l(x)}(rank(y)). and rank(parent(x)) >= A_{l(x)}(A_{l(x)}(...(rank(x)))), where there are i(x) applications of A_{l(x)}. But, rank(y) >= rank(parent(x)), and after path compression, x's parent has rank at least that of y's parent's rank *before* path compression. Putting this all together, this means that x's index has gone up, or else x's level has gone up, and in any case x's potential has gone down. This proves the claim. That's it. The book's proof is slightly different. You are encouraged to take a look.