15-451 Algorithms Fall 2012 D. Sleator Augmenting Binary Trees September 27, 2012 -- insertion/deletion/split/join in splay trees -- augmenting trees * rank/select * segment trees * reversal and simulating pancake sorting see http://www.codeforces.com/contest/117/submission/860934 for sample splay tree code Also at the end of this lecture. ---------------------------------------------------------------------------- How to use splaying to do other important operations on trees ------------------------------------------------------------- Insertion (see last lecture notes) Deletion (see last lecture notes) split(S, k) S is a set reprecented by a splay tree. k is a key. This returns two sets S1 of keys <= k and S2 of keys > k. This is implemented as follows. First search for key k in the tree. If it is there, we splay it. Now detach the left subtree of the root and that is S1. And the rest is S2. If k is not in the tree, then let m be the last key found before we went off the bottom of the tree. We know that m is either the smallest key > k in the set or the largest key < k in the set. So we splay m, and separate the tree by removing m's left or right subtree, which ever is appropriate. This is amortized O(log n). The search for k is paid for by the splay. The subsequent cutting of links only reduces the potential. Join(S,T) Here S and T are trees representing sets. All the keys in S are less than all the keys in T. This forms the union of these two sets and returns it. It works as follows. Let k be the largest key in S. (Find it by walking down right children until there is none.) Now do Splay(S,k). Now simply make T the right child of the root of the new S. This is amortized O(log n). The work of finding k is paid for by the splay. The linking of T to S can cause the potential of the root of S to increase by at most log(n). Augmenting Binary Search Trees ============================== In this lecture we will discuss how to add extra fields to binary search trees to give them additional functionality. We'll work in the context of splay trees, although most of the ideas carry over to other types of search trees. Rank/Select ----------- Suppose we want to represent a set of keys and support the following operations: rank(S,k): return the rank of key k. The smallest element in the set has rank 0, the largest has rank n-1 (where n is the size of the set). select(S,i): return the ith key in the set. insert(S,k): if k is not already in the set, then insert k into the set. delete(S,k): if k is in the set, delete it. The simplest way to do this is to add an extra field to the nodes. So in addition to left and right, we'll also store the *size* of the node. If n is a node, then n.size is the number of nodes in the subtree rooted there. We need to show how to implement all of the operations and to correctly and efficiently maintain the size fields of all the nodes. Let's denote by n.l and n.r the left and right children of a node n. We can use the following fact to help us keep the sizes: n.size = n.l.size + n.r.size + 1 (The null nodes have size 0.) When we do a rotation, the only nodes whose sizes change are the two involved in the rotation. y x right rotation: / ====> \ x y To update the sizes after the rotation, we first fix y (by applying the above formula) and then we fix x. This allows us to splay while maintaining the size fields. It remains to show how to implement rank and select. Here's how select(S,i) works. In general we're at some node and we're looking for the ith smallest key in the subtree rooted there. We look at the size of the left subtree. If it's i-1 then the current node is the one we want. If it's less than this then we need to go down to the right. If it's greater than this, we need to go down to the left. When we go to the right we must adjust the index we're looking for because of the stuff we just passed (the current node and the left subtree). Note that after finding the appropriate node, we must now splay this node. It's required to do this to pay for the work we just did to find it. (Do example) For rank(s,k), we first find key k in the tree. We do this by a normal search in the tree. After we find it, we splay it to the root. It's rank is the size field of the left child of the root. (Again, splaying is necessary to pay for the work. It also makes it simpler to compute the rank.) Segment Trees ------------- The problem here is to maintain a dynamically changing collection of (possibly overlapping) segments on a line. It can be viewed as a 1-D geometry problem. insert(S,lo,hi) Insert a new segment [lo,hi] into the set delete(S,lo,hi) Delete the given segment from the set find(S,x) Determine if x is in one of the segments of S. If it is, return a segment containing x. We'll solve this problem using a splay tree where there's one node corresponding to each segment in the set. The tree is ordered by the lo values of the segments. To implement find we're going to use the split operation of splay trees. This will let us split the segments into two sets. The set with left endpoints which are <= x and the ones with left endpoints > x. Let L be the set with left endpoints <= x. The point x will be in one of the segments if and only if some element of L has a right endpoint that is >= x. It will be possible to do this if we were able to keep the maximum of all the right endpoints in a subtree rooted at a node. So this is the auxiliary information that we will keep. So in addition to keeping [lo,hi] in each node, we add a new field that is the maximum value of hi in any node in the subtree rooted here. (See example on page 313 of CLRS.) So in a node n, let n.m be the maximum value of the hi fields in in all the nodes in the subtree rooted here. Given that the left and right children are set correctely, here's how we can set n.m: n.m = max (n.l.m, n.r.m, n.hi) (Where the m value of a NULL is -infinity.) So here's how find(S,x) works. Split the tree using key x. Let L be the resulting left set with root r. If r.m < x then we know there is no interval containing x. If r.m >= x then see which of the left subtree of r, the node r, or the right subtree of r contains an interval containing x. Continue traversing down the tree in this fashion until we come to a node with an interval containing x. When we find this node, we have to splay it, of course, and then glue the tree back together to preserve the interval set. It's easy to use the formula for n.m above to keep the m fields correct whenever we do a rotation. (Just like the size fields.) Insert and Delete are now standard. Pancake sorting --------------- You probably recall from 15-251 how to sort by prefix reversal (also known as pancake flipping). Here's a problem we want to solve. Given an arbitrary permutation of numbers 1...n (a list of numbers) we want to sort them by reversing prefixes. For example, say we start with: 5 1 4 2 3 Reverse the whole thing and get: 3 2 4 1 5 Now put 4 at the front: 4 2 3 1 5 Now put 4 in place: 1 3 2 4 5 Now put 3 in front: 3 1 2 4 5 Not put 3 in place: 2 1 3 4 5 finish up: 1 2 3 4 5 Clearly this algorithm takes at most 2(n-1) flips. But suppose you wanted to print out the set of prefix lengths that are flipped in order to sort the permutation. So in this case we would output 5 3 4 2 3 2 The naive algorithm to compute this will take time O(n^2), because it has to maintain, say, an array of all the elements, and each prefix reversal takes O(n) time and there are O(n) of them. Not to mention searching for the next thing that needs to be put in place. We'll show how to use augmented splay trees to do this in O(n log n) time. Since the tree is not going to keep the keys in symmetric order (left to right) we need a way to find a given key in the tree. To facilitate this we create an array A where A[i] is a pointer to the node in the tree that stores i. This array does not change over time. To allow for prefix reversal, we will keep what I call a reverse bit r in each node. The meaning of this bit is the following. If n.r is true, then the meaning of left and right in the subtree rooted at n are reversed. And this applies recursively to all the reverse bits in the tree. (Example) So, if i have a tree that represents the sequence 3 2 1 4, and I toggle the reverse bit of the root of the tree, then I am representing 4 1 2 3. If I have a node n where n.r=true, and can "discharge" this reversal by swapping the left and right children of n and then toggling the reverse bits in the left and right children. This follow from the following identity for strings. If A and B are strings and AB is the concatenation of A and B we have: reverse(AB) = reverse(B) reverse(A) How do we implement splaying in a tree with reverse bits? The shape of paths is of course important to the splaying algorithm. So what we can do is before we splay along a path in the tree, we first discharge all the reverse bits along that path. Then we can use the standard splaying algorithm. Every node to which a rotation is applied has its reverse bit = false. We're also going to need to maintain size fields as we did for the rank/select problem. Here's how we can compute the prefix reversal sequence in O(n log n) time. First we construct an initial tree to represent the initial sequence. We also construct the array A[] described above. Now we find key n in the tree by looking up A[n]. This points to a node. We splay this node. Now n is at the root of the tree. The size field of the left subtree of the root tells us the index of n. We split off n and its left subtree from the rest. Now we reverse that part (by toggling its reverse bit), and join them together again. Now n is at the left. And we now toggle the reverse bit of the whole thing (and output the size of the tree). n is now at the right end. So we delete n from the tree, and repeat the whole process for n-1. The process of handling n is a split and a join, which is O(log n). Thus the whole algorithm is O(n log n). -------------------Java implementation of this----------------- import java.io.*; import java.util.*; /* This program takes an array which contains a permutation of 0...n-1, and computes a sequence of prefix reversals that sorts the permutation. It's set up to take a permutation on the command line, like this: $ java Pancake 5 4 1 2 0 3 6 4 2 3 The output is a sequence of prefixes to reverse in order to sort the permutation. D. Sleator, September 2012 */ public class Pancake { public static void main(String[] args) { int n = args.length; int[] A = new int[n]; for (int i=0; i=0; k--) { Node x = ptr[k]; Splay.splay(x); int j = (x.l==null)?0:x.l.size; if (j!=k) { // if j=k then k is already in place if (j>0) { // move k to the front moves[m++] = j+1; Node b = x.r; b.p = null; x.r = null; x.update(); x.flip = !x.flip; Splay.join(x, b); Splay.splay(x); } moves[m++] = k+1; x.flip = !x.flip; // move k to the back } Splay.delete(x); } int[] ret = new int[m]; for (int i=0; i