Random Search Trees
                             (Kozen 13)

Various schemes have been proposed for using randomization to "balance"
search trees.  It's based on the fact that a "random" tree is balanced.

Two approaches:
                  Skip Lists (Pugh)
                  Treaps (Aragon and Seidel)

We'll present treaps.

Say we're storing a set of keys 1...n in a binary tree.  The keys are
stored in symmetric order by their value.  So the left subtree of a key
k contains only keys less than k, and the right subtree contains only
keys greater than k.

Say that each key also has a unique priority p(k).  (Drawn from a
totally ordered set.)  Supose we assert that the tree is not only
symmetric ordered by key, but also heap ordered by priority, with the
highest priority at the root.

Lemma: for any assignment of the priorities, there's a unique tree
satisfiying the symmetric and heap order constraints.

Proof: You can build the required tree simply by inserting the elements
in decreasing order of priority into an initially empty tree.  We can
show it's unique by induction.  Say you have two different trees
supposedly satisfying these constraints.  If they have different nodes
at the roots, then the tree with the root with the lower priority must
have a violation.  If the roots are the same, then the two left subtrees
must have the same sets of nodes in them, and similarly for the right
subtrees, and they therefore must be identical by induction.  QED.

We will assume that we can generate a random priority without
collisions.

Here are the algorithms for the standard operations using Treaps:

   Search:    Normal binary search

   Insertion: Search down from the root till we come to the null pointer
              where the new node ought to go.  Insert the new node there
              with a random priority.  Now rotate that node up the tree
              until heap order is restored.

   Deletion:  Say x is the node we want to delete.  Rotate x down the
              tree till it has no children.  Then simply remove x.  At
              each step rotate x with its child of highest priority.


Given a tree, we'll refer to the _ancestor relation_ among the nodes in
the tree.  A node a is an ancestor of a node b if the path from b to the
root includes a.  A tree is heap ordered if and only if for every pair
(a,b) where a is an ancestor of b, we have that p(a) >= p(b).

Insertion works because as we rotate x (the inserted node) up the tree
heap order (among nodes other than x) is preserved.  To see this,
consider moving x up the tree by doing a left rotation in the following
picture:


                           b
                          / \
                         A   x
                            / \
                           C   D

becomes:

                               x
                              / \
                             b   D
                            / \
                           A   C

The ancestor relation (ignoring node x) is preserved with the exception
that node b was an ancestor of all nodes in tree D before, but not
after.  Therefore if the nodes (excluding x) were heap ordered before
the rotation then they are heap ordered after the rotation.

Insertion terminates when x has a priority less than its parent.  At
this point heap order has been restored to all nodes including x.

The above picture also helps in understanding deletion.  In this case x
moves down the tree (the bottom picture is converted by a right rotation
into the top picture.)  Again the ancestor relation (ignoring x) is
preserved, except that b is now an ancestor of all the nodes in subtree
D.  We chose to rotate b up because its priority is greater than the
root of D.  Thus heap order is preserved.

(Do some examples)

Running-time Analysis.

Theorem 1: The expected length of a path to a node m in a treap of n nodes
is:

                       H[m] + H[n-m+1] - 1

(Where H[k] = 1/1 + 1/2 + 1/3 + ... + 1/k (harmonic number))

To analyze insertions and deletions we'll prove this:

Theorem 2: The expected number of rotations in a deletion (or insertion)
is less than 2.

Note that H[m] <= 1+ln(m)

Before we prove these theorems we'll need the following lemma:

Ancestor Lemma: Let m<n.  Consider a n-node treap A with nodes 1...n.
Using the same priorities, build an m-node treap B using nodes 1...m.
The ancestor relation of among the nodes 1...m is the same in both
trees.

Proof: Build treap B.  Now convert B into A by inserting the elements
m+1, m+2, ... n into it.  (The tree that results is the same regardless
of what order the elements are inserted.)  As you're doing these
insertions, the subtree D in the above picture is always empty, cause
the node being inserted is always the rightmost node.  Therefore the
ancestor relation among 1...m is unaffected by the insertion.  QED.

Proof of Theorem 1:

The actual priority values are not important, only the order.  So we can
replace the notion of priority with a permutation of all of the keys.
The permutation is simply the keys sorted by decreasing order of
priority. 

So if we have such a permutation, the tree generated is simply the one
obtained by inserting the keys in the order they appear in the
permutation. 

(Give an example here, like the one on page 68 of kozen)

For a given key m, we want to determine the expected depth of m in the
tree, averaged over all permutations sigma of the keys.

The ancestors of key m consists of keys <m and keys >m and =m.  The
ancestor lemma shows that the set of ancestors of m that are <m depends
only on the priorities of the keys <m, and not on the priorities of the
other keys.  (In terms of the permutation, this means that the
ancestors of m that are <m depends only on the permutation of 1...m
induced by the given permutation of 1...n and not on the other of the
other elements, the ones from m+1...n.)

So to compute the expected number of ancestors of m, we simply have to
compute the expected number of ancestors of m that are <m and add it to
the expected number of ancestors >m.  (Then add 1 for m itself.)

So consider a treap built with keys 1...m.  Consider the process of
building the treap by inserting the nodes in the order they appear in
the permutation.  Which nodes are ancestors of m?  These are the nodes
on the right path from the root down to m.  A node k appears on that
path if and only if:

  (1) k is inserted before m
  (2) must be the greatest node inserted so far.

So now we can state a property in terms of the permutation inducing the
treap.  We're interested in each key k in the permutation that has the
following properties:

  (1) k occurs before m in the permutation
  (2) k is a left-right maximum (greater than all those to its left)

So our problem is reduced to computing the expected number of keys in a
random permutation that have these properties.  Given a random
permutation, we can write a check mark next to each element that has
these properties.  Let X[m] be the expected number of check marks.  We
know that X[1] = 0.

We can write a recurrence for X[] as follows.  Generate a random
permutation of 1...m by first generating a random permutation of 2...m
then inserting 1 randomly in it.  After generating the permutation of
2...m we label the keys with check marks. Then we insert 1.  This
operation does not change any of the already existing check marks
because 1 is less than all the other elements.  The number of check
marks before inserting 1 is just X[m-1].  By linearity of expectation,
X[m] is just X[m-1] plus the expected number of check marks on 1.  This
is just the probability that 1 is marked.  This probability is just 1/m
because 1 is only marked if it's the first element of the permutation.
This gives:

   X[1] = 0.
   X[m] = X[m-1] + 1/m.

The solution of this recurrence is just X[m] = H[m]-1.

So the expected number of ancestors of m, counting m itself
is just

       X[m] + X[n-m+1] + 1

Which is

       H[m] + H[n-m+1] - 1.
   
QED.


Now we can prove Theorem 2.  Say we're deleting m.  Consider the sum of
the number of nodes on the right path from the left child of m and the
number of nodes on the left path from the right child of m.  This
quantity decreases by exactly one on each rotation.  (Draw pictures).

So our goal is to evaluate the expected value of this sum, which is the
sum of the expected values of each part.

Let the two subtrees of m be A and B.  Consider the treap that would be
built by using only elements 1...m.  Let A' be the left subtree of m in
this treap.  By the ancestor lemma, the set of nodes in A is the same as
the set of nodes in A'.  In fact A and A' are equal, because the same
nodes are being inserted in the same order.

Therefore we can do just what we did in the proof of theorem 1.  We
simply restrict ourselves to what happens to the stuff in 1...m,
compute the expected length of the path, and add it to the expectation
for the other side.

So the question is, which nodes end up being on the right path of A?

For a node k to end up there, it has to be inserted after m.  And it has
to be greater than any node inserted so far (except m).  Here's the
picture.


               x
                \
                 x
                  \
                   x
                    \
                     m
                    /
                   x
                    \
                     x
                      \
                       k


If k was less than any of the xs, it would not end up there.  If k is
greater than any x, it does end up there. And once on the path of
interest, it stays there.

So suppose we take a random permutation of 1...m, and we check any item k
that occurs after m but is greater than any element to its left (except
m).  What is the expected number of checks?

Let Y[m] be this quantity.  Y[1] = 0.  One way of generating these marks
is to first generate a random permutation of 2...m, install the marks,
then insert 1 in a random position.  The insertion of 1 does not change
any of the marks installed before.  The expected number of marks on 1 is
the probably that 1 is marked.  This only happens if the permutation
starts out [m1...].  The probability of this happening is 1/m(m-1).
Thus we get the following recurrence.

   Y[1] = 0
   Y[m] = Y[m-1] + 1/m(m-1)

The solution of this is just Y[m] = (m-1)/m.

QED.