Random Search Trees
(Kozen 13)
Various schemes have been proposed for using randomization to "balance"
search trees. It's based on the fact that a "random" tree is balanced.
Two approaches:
Skip Lists (Pugh)
Treaps (Aragon and Seidel)
We'll present treaps.
Say we're storing a set of keys 1...n in a binary tree. The keys are
stored in symmetric order by their value. So the left subtree of a key
k contains only keys less than k, and the right subtree contains only
keys greater than k.
Say that each key also has a unique priority p(k). (Drawn from a
totally ordered set.) Supose we assert that the tree is not only
symmetric ordered by key, but also heap ordered by priority, with the
highest priority at the root.
Lemma: for any assignment of the priorities, there's a unique tree
satisfiying the symmetric and heap order constraints.
Proof: You can build the required tree simply by inserting the elements
in decreasing order of priority into an initially empty tree. We can
show it's unique by induction. Say you have two different trees
supposedly satisfying these constraints. If they have different nodes
at the roots, then the tree with the root with the lower priority must
have a violation. If the roots are the same, then the two left subtrees
must have the same sets of nodes in them, and similarly for the right
subtrees, and they therefore must be identical by induction. QED.
We will assume that we can generate a random priority without
collisions.
Here are the algorithms for the standard operations using Treaps:
Search: Normal binary search
Insertion: Search down from the root till we come to the null pointer
where the new node ought to go. Insert the new node there
with a random priority. Now rotate that node up the tree
until heap order is restored.
Deletion: Say x is the node we want to delete. Rotate x down the
tree till it has no children. Then simply remove x. At
each step rotate x with its child of highest priority.
Given a tree, we'll refer to the _ancestor relation_ among the nodes in
the tree. A node a is an ancestor of a node b if the path from b to the
root includes a. A tree is heap ordered if and only if for every pair
(a,b) where a is an ancestor of b, we have that p(a) >= p(b).
Insertion works because as we rotate x (the inserted node) up the tree
heap order (among nodes other than x) is preserved. To see this,
consider moving x up the tree by doing a left rotation in the following
picture:
b
/ \
A x
/ \
C D
becomes:
x
/ \
b D
/ \
A C
The ancestor relation (ignoring node x) is preserved with the exception
that node b was an ancestor of all nodes in tree D before, but not
after. Therefore if the nodes (excluding x) were heap ordered before
the rotation then they are heap ordered after the rotation.
Insertion terminates when x has a priority less than its parent. At
this point heap order has been restored to all nodes including x.
The above picture also helps in understanding deletion. In this case x
moves down the tree (the bottom picture is converted by a right rotation
into the top picture.) Again the ancestor relation (ignoring x) is
preserved, except that b is now an ancestor of all the nodes in subtree
D. We chose to rotate b up because its priority is greater than the
root of D. Thus heap order is preserved.
(Do some examples)
Running-time Analysis.
Theorem 1: The expected length of a path to a node m in a treap of n nodes
is:
H[m] + H[n-m+1] - 1
(Where H[k] = 1/1 + 1/2 + 1/3 + ... + 1/k (harmonic number))
To analyze insertions and deletions we'll prove this:
Theorem 2: The expected number of rotations in a deletion (or insertion)
is less than 2.
Note that H[m] <= 1+ln(m)
Before we prove these theorems we'll need the following lemma:
Ancestor Lemma: Let mm and =m. The
ancestor lemma shows that the set of ancestors of m that are m. (Then add 1 for m itself.)
So consider a treap built with keys 1...m. Consider the process of
building the treap by inserting the nodes in the order they appear in
the permutation. Which nodes are ancestors of m? These are the nodes
on the right path from the root down to m. A node k appears on that
path if and only if:
(1) k is inserted before m
(2) must be the greatest node inserted so far.
So now we can state a property in terms of the permutation inducing the
treap. We're interested in each key k in the permutation that has the
following properties:
(1) k occurs before m in the permutation
(2) k is a left-right maximum (greater than all those to its left)
So our problem is reduced to computing the expected number of keys in a
random permutation that have these properties. Given a random
permutation, we can write a check mark next to each element that has
these properties. Let X[m] be the expected number of check marks. We
know that X[1] = 0.
We can write a recurrence for X[] as follows. Generate a random
permutation of 1...m by first generating a random permutation of 2...m
then inserting 1 randomly in it. After generating the permutation of
2...m we label the keys with check marks. Then we insert 1. This
operation does not change any of the already existing check marks
because 1 is less than all the other elements. The number of check
marks before inserting 1 is just X[m-1]. By linearity of expectation,
X[m] is just X[m-1] plus the expected number of check marks on 1. This
is just the probability that 1 is marked. This probability is just 1/m
because 1 is only marked if it's the first element of the permutation.
This gives:
X[1] = 0.
X[m] = X[m-1] + 1/m.
The solution of this recurrence is just X[m] = H[m]-1.
So the expected number of ancestors of m, counting m itself
is just
X[m] + X[n-m+1] + 1
Which is
H[m] + H[n-m+1] - 1.
QED.
Now we can prove Theorem 2. Say we're deleting m. Consider the sum of
the number of nodes on the right path from the left child of m and the
number of nodes on the left path from the right child of m. This
quantity decreases by exactly one on each rotation. (Draw pictures).
So our goal is to evaluate the expected value of this sum, which is the
sum of the expected values of each part.
Let the two subtrees of m be A and B. Consider the treap that would be
built by using only elements 1...m. Let A' be the left subtree of m in
this treap. By the ancestor lemma, the set of nodes in A is the same as
the set of nodes in A'. In fact A and A' are equal, because the same
nodes are being inserted in the same order.
Therefore we can do just what we did in the proof of theorem 1. We
simply restrict ourselves to what happens to the stuff in 1...m,
compute the expected length of the path, and add it to the expectation
for the other side.
So the question is, which nodes end up being on the right path of A?
For a node k to end up there, it has to be inserted after m. And it has
to be greater than any node inserted so far (except m). Here's the
picture.
x
\
x
\
x
\
m
/
x
\
x
\
k
If k was less than any of the xs, it would not end up there. If k is
greater than any x, it does end up there. And once on the path of
interest, it stays there.
So suppose we take a random permutation of 1...m, and we check any item k
that occurs after m but is greater than any element to its left (except
m). What is the expected number of checks?
Let Y[m] be this quantity. Y[1] = 0. One way of generating these marks
is to first generate a random permutation of 2...m, install the marks,
then insert 1 in a random position. The insertion of 1 does not change
any of the marks installed before. The expected number of marks on 1 is
the probably that 1 is marked. This only happens if the permutation
starts out [m1...]. The probability of this happening is 1/m(m-1).
Thus we get the following recurrence.
Y[1] = 0
Y[m] = Y[m-1] + 1/m(m-1)
The solution of this is just Y[m] = (m-1)/m.
QED.