15-750 Grad Algorithms			02/01/01
* Union-find
- list-based method
- forest-based method
=============================================================================
Today's topic is a data structure for the union-find problem.  Here's
the problem:

We want to maintain a collection of disjoint sets with operations:
	MakeSet(x) - creates set containing just x.  Returns ptr p_x
	Union(p_x,p_y) - union together the set x is in with the set
			y is in (replacing those two sets with their union).
	Find(p_x) - return the unique ID for the set containing x (this
			is just some representative element of this set).

[From now on, will just say "x" instead of "p_x", etc]

E.g., Kruskal's algorithm: 
  - Do MakeSet(v) for each node v.
  - for each edge (x,y) in order from shortest to longest:
     if find(x) != find(y), put (x,y) into tree and do Union(x,y).

Notation: n = # of make-set operations, (so at most n-1 unions)
	  m = # of total operations.

	This matches  n= # vertices, m = # of edges pretty well, since we do
	2 finds for each edge. 

Algorithm 1 (list-based): total cost will be O(m + n log n)
-----------------------------------------------------------
Each set will be just a linked list: each element has pointer to
next element in list.  But, augment so that each element also has
a pointer directly to head of list.   Head of list is the
representative element. 

Makeset(x) - constant time: just create list of 1 element and return ptr.
Find(x) - also constant time: just follow pointer to head of list.

Union(x,y) - to union the two lists (let's call them A and B with
	lengths L_A and L_B).  We append B onto end of A.  What is
	is the time needed?  Answer: we first have to walk down A to get to
	bottom element.  Then hook up head of B to this.  Then walk down B
	updating the pointers to the head.  So total time is O(L_A + L_B)

Q: Can we get just O(L_B)?
	
	Method 1: splice B into the middle of A, at x.
	Method 2: if store pointer to tail in the head, this also avoids
	walking down A.

Q: Can we reduce this to O(min(L_A,L_B))?

A: Yes.  Just store length of list in the head. Then compare and
	append shorter one onto end of longer one.  Then update the
	length count. 

Claim: this gives us the O(m + n log n) total running time.

Proof: The only part we need to worry about is the "walking down the
shorter list" part of the union operation.  All the rest is constant
time per op, so goes into the O(m).

So, let's look at this from the point of view of some lowly element x
sitting in a list somewhere.   Over time, how many times does x get stepped
on as we walk down the list and update pointers?  Answer: at most
O(log n) times.  Reason: because every time x gets stepped on, the
size of the list he is in doubles (at least).

So, if we look at the total time for m operations, we spend O(m) in
the constant-time portion, and then we can charge off the walking-down
portion to the elements walked over.  There are n elements, and for
each one the total cost is at most log(n).  So, overall, the total is 
	O(m + n log n)

...This is a neat proof I think.  A good time for questions...

Notice that in Kruskal's algorithm, it's the sorting time that
dominates since that was O(m log m).

Algorithm 2: 
------------
 Say we did the above in a lazy way.  Just point head of B to head of
A.  Now, a find might become more expensive because we have to take
multiple hops.  So, as we do the find, let's point every element we
traverse directly to the root (basically doing Algorithm 1 lazily).
We'll prove that this has a very nice amortized performance.
Actually, we'll need to make one technical change: instead of deciding
which of two roots should be the new one based on the SIZE of the set,
we'll do it based on which has larger RANK, which we'll define in a
minute.  Also, notice that we no longer need the downward pointers:
what we have in general is a collection of trees, with all links
pointing UP.

Here's the algorithm more formally:

 - MakeSet(x) takes constant time: just set x->parent = x.

 - To do a find(x), we walk up tree to the root, updating x and all
	the nodes we pass over to point to the root.  This is called PATH
	COMPRESSION.  E.g., can imagine implementing like this:

		If x is root, return x.
		Else, x->parent = find(x->parent).  Return x->parent.

	Time is proportional to (original) distance of x to root.

 - union(x,y) = link(find(x), find(y)).  Link(root1,root2) makes the root
	of smaller RANK point to the other (this is called UNION-BY-RANK).

   RANK defined by:
	- When you do a makeset(x), x gets rank = 0.

	- When you link a root of rank r1 to a root of rank r2,  then
	if r1 == r2, you add 1 to the rank of the new root.  Otherwise
	you don't. 

   [E.g., rank = height if we never did path compression]

Properties of ranks:
1. Rank(x) can only change if x is a root.  Once x becomes a non-root,
   then its rank never changes again. 

2. If x isn't root, then rank(x) is strictly less than rank(x->parent).
   Over time, rank(x->parent) will never go down (but can increase
   either because parent changes with path compression or rank changes
   with link).

3. rank(x) <= n-1, and in fact rank(x) <= log(n) [prove by induction]
	though we won't need that.


We will prove a running time of O(m alpha(n)), where alpha(n) is the
inverse-Ackermann function.  The proof below follows some notes of
Tarjan.  It is also similar to what is in the book, so you can look at
that too.  Note: some of the definitions are slightly different...

 Ackermann's function is defined like this:
   A_0(x) = x+1
   A_i(x) = A_{i-1}(A_{i-1}(A_{i-1}...(x))), where there are x+1
		applications of A_{i-1}.

 So, for instance, 
	A_1(x) = 2x+1.
        A_2(x) = A_1(A_1(...(x))) > 2^{x+1}.
	A_3(x) = A_2(A_2(...(x))) > "a stack of x+1 2's, with x at the top"


 alpha(n) = smallest value of k such that A_k(1) >= n.

 E.g., if alpha(n)=5, that means that 
	n > A_4(1) = A_3(A_3(1)) = A_3(A_2(A_1(A_0(A_0(1))))) = A_3(A_2(7))
                   > A_3(256) > "stack of 257 2's with 256 at the top".

 [So, basically alpha(n) is never bigger than 4]


ANALYSIS:

* If x is not a root AND its rank is > 0, then define 
	l(x) = level(x) = largest L such that rank(parent(x)) >= A_L(rank(x)) 

  i.e., if you are in a high level, then your parent has a *much*
  higher rank than you.  Can now verify that 0 <= level(x) < alpha(n).

* If x is not a root AND its rank is > 0, then define i(x) = index(x) =
	largest i s.t. rank(parent(x)) >= A_{l(x)}(A_{l(x)}(...(rank(x)))),
        (where we apply A_{l(x)} i times).

  in other words, the index tells you at a slightly finer grain where
  you live in your level.  Can verify that 1 <= i(x) <= rank(x).

Now we define a potential function on nodes like this:

  phi(x) = 0 if rank(x) = 0.
  phi(x) = alpha(n)*rank(x) if x is a root.
  phi(x) = (alpha(n) - level(x))*rank(x) - index(x) otherwise.

Easy to check that phi(x) >= 0.

Total potential is sum of node potentials.  Amortized cost = actual
cost + change in potential.

How to think about potentials: Initially, a node x will be a root with
potential 0.  Its potential will then go up as its rank increases.
But, once x becomes a non-root, then its rank is fixed and its
potential will then start going down.  In particular, notice that
because index is between 1 and rank(x), any time x's level or index
changes, its potential will drop by at least 1.

Claim 1: amortized cost of link(r1,r2) is <= alpha(n) + 1.
Claim 2: amortized cost of find(z) is <= alpha(n) + 2.

Proof of claim 1:
  This is the easier of the two.  The actual cost of a link is 1.  But
we need to look at the change in potential.   Luckily, the only node
whose potential can possibly go up is the new root, and its potential
goes up by at most alpha(n) (since its rank goes up by at most 1).
So, the amortized cost of the link is at most alpha(n)+1.


Proof of claim 2: Say there are t nodes on the path from z to the
root.  So, the actual cost is t.  We will show that at least t-alpha(n)-2
of these nodes go down in potential.  This will imply that the
amortized cost is at most alpha(n)+2.

First of all, we won't worry about z or the root (that's the -2).
Second, we won't worry about nodes x such that all of x's ancestors
have a lower level than x does.  There are at most alpha(n) of these
since there are at most alpha(n) different levels.  

So, let x be one of the remaining nodes on the path.  We know
rank(x)>=1 and we know that some ancestor y of x has level(y) >=
level(x).  This means that before the path compression, we had:
	rank(parent(y)) >= A_{l(y)}(rank(y)) >= A_{l(x)}(rank(y)).
and
	rank(parent(x)) >= A_{l(x)}(A_{l(x)}(...(rank(x)))), where
		there are i(x) applications of A_{l(x)}.

But, rank(y) >= rank(parent(x)), and after path compression, x's
parent has rank at least that of y's parent's rank *before* path
compression.  Putting this all together, this means that x's index has
gone up, or else x's level has gone up, and in any case x's potential
has gone down.  This proves the claim.

That's it.     The book's proof is slightly different.  You are
encouraged to take a look.