Random Search Trees (Kozen 13) Various schemes have been proposed for using randomization to "balance" search trees. It's based on the fact that a "random" tree is balanced. Two approaches: Skip Lists (Pugh) Treaps (Aragon and Seidel) We'll present treaps. Say we're storing a set of keys 1...n in a binary tree. The keys are stored in symmetric order by their value. So the left subtree of a key k contains only keys less than k, and the right subtree contains only keys greater than k. Say that each key also has a unique priority p(k). (Drawn from a totally ordered set.) Supose we assert that the tree is not only symmetric ordered by key, but also heap ordered by priority, with the highest priority at the root. Lemma: for any assignment of the priorities, there's a unique tree satisfiying the symmetric and heap order constraints. Proof: You can build the required tree simply by inserting the elements in decreasing order of priority into an initially empty tree. We can show it's unique by induction. Say you have two different trees supposedly satisfying these constraints. If they have different nodes at the roots, then the tree with the root with the lower priority must have a violation. If the roots are the same, then the two left subtrees must have the same sets of nodes in them, and similarly for the right subtrees, and they therefore must be identical by induction. QED. We will assume that we can generate a random priority without collisions. Here are the algorithms for the standard operations using Treaps: Search: Normal binary search Insertion: Search down from the root till we come to the null pointer where the new node ought to go. Insert the new node there with a random priority. Now rotate that node up the tree until heap order is restored. Deletion: Say x is the node we want to delete. Rotate x down the tree till it has no children. Then simply remove x. At each step rotate x with its child of highest priority. Given a tree, we'll refer to the _ancestor relation_ among the nodes in the tree. A node a is an ancestor of a node b if the path from b to the root includes a. A tree is heap ordered if and only if for every pair (a,b) where a is an ancestor of b, we have that p(a) >= p(b). Insertion works because as we rotate x (the inserted node) up the tree heap order (among nodes other than x) is preserved. To see this, consider moving x up the tree by doing a left rotation in the following picture: b / \ A x / \ C D becomes: x / \ b D / \ A C The ancestor relation (ignoring node x) is preserved with the exception that node b was an ancestor of all nodes in tree D before, but not after. Therefore if the nodes (excluding x) were heap ordered before the rotation then they are heap ordered after the rotation. Insertion terminates when x has a priority less than its parent. At this point heap order has been restored to all nodes including x. The above picture also helps in understanding deletion. In this case x moves down the tree (the bottom picture is converted by a right rotation into the top picture.) Again the ancestor relation (ignoring x) is preserved, except that b is now an ancestor of all the nodes in subtree D. We chose to rotate b up because its priority is greater than the root of D. Thus heap order is preserved. (Do some examples) Running-time Analysis. Theorem 1: The expected length of a path to a node m in a treap of n nodes is: H[m] + H[n-m+1] - 1 (Where H[k] = 1/1 + 1/2 + 1/3 + ... + 1/k (harmonic number)) To analyze insertions and deletions we'll prove this: Theorem 2: The expected number of rotations in a deletion (or insertion) is less than 2. Note that H[m] <= 1+ln(m) Before we prove these theorems we'll need the following lemma: Ancestor Lemma: Let mm and =m. The ancestor lemma shows that the set of ancestors of m that are m. (Then add 1 for m itself.) So consider a treap built with keys 1...m. Consider the process of building the treap by inserting the nodes in the order they appear in the permutation. Which nodes are ancestors of m? These are the nodes on the right path from the root down to m. A node k appears on that path if and only if: (1) k is inserted before m (2) must be the greatest node inserted so far. So now we can state a property in terms of the permutation inducing the treap. We're interested in each key k in the permutation that has the following properties: (1) k occurs before m in the permutation (2) k is a left-right maximum (greater than all those to its left) So our problem is reduced to computing the expected number of keys in a random permutation that have these properties. Given a random permutation, we can write a check mark next to each element that has these properties. Let X[m] be the expected number of check marks. We know that X[1] = 0. We can write a recurrence for X[] as follows. Generate a random permutation of 1...m by first generating a random permutation of 2...m then inserting 1 randomly in it. After generating the permutation of 2...m we label the keys with check marks. Then we insert 1. This operation does not change any of the already existing check marks because 1 is less than all the other elements. The number of check marks before inserting 1 is just X[m-1]. By linearity of expectation, X[m] is just X[m-1] plus the expected number of check marks on 1. This is just the probability that 1 is marked. This probability is just 1/m because 1 is only marked if it's the first element of the permutation. This gives: X[1] = 0. X[m] = X[m-1] + 1/m. The solution of this recurrence is just X[m] = H[m]-1. So the expected number of ancestors of m, counting m itself is just X[m] + X[n-m+1] + 1 Which is H[m] + H[n-m+1] - 1. QED. Now we can prove Theorem 2. Say we're deleting m. Consider the sum of the number of nodes on the right path from the left child of m and the number of nodes on the left path from the right child of m. This quantity decreases by exactly one on each rotation. (Draw pictures). So our goal is to evaluate the expected value of this sum, which is the sum of the expected values of each part. Let the two subtrees of m be A and B. Consider the treap that would be built by using only elements 1...m. Let A' be the left subtree of m in this treap. By the ancestor lemma, the set of nodes in A is the same as the set of nodes in A'. In fact A and A' are equal, because the same nodes are being inserted in the same order. Therefore we can do just what we did in the proof of theorem 1. We simply restrict ourselves to what happens to the stuff in 1...m, compute the expected length of the path, and add it to the expectation for the other side. So the question is, which nodes end up being on the right path of A? For a node k to end up there, it has to be inserted after m. And it has to be greater than any node inserted so far (except m). Here's the picture. x \ x \ x \ m / x \ x \ k If k was less than any of the xs, it would not end up there. If k is greater than any x, it does end up there. And once on the path of interest, it stays there. So suppose we take a random permutation of 1...m, and we check any item k that occurs after m but is greater than any element to its left (except m). What is the expected number of checks? Let Y[m] be this quantity. Y[1] = 0. One way of generating these marks is to first generate a random permutation of 2...m, install the marks, then insert 1 in a random position. The insertion of 1 does not change any of the marks installed before. The expected number of marks on 1 is the probably that 1 is marked. This only happens if the permutation starts out [m1...]. The probability of this happening is 1/m(m-1). Thus we get the following recurrence. Y[1] = 0 Y[m] = Y[m-1] + 1/m(m-1) The solution of this is just Y[m] = (m-1)/m. QED.