15-750 Graduate Algorithms January 18, 2001 -- binary search trees in general -- treaps Binary Search Trees ------------------- You have probably seen binary search trees before. You should understand what they are. Binary search trees is a class of data structures where (1) Each node stores a piece of data (2) Each node has two pointers to two other binary search trees (3) The overall structure of the pointers is a tree (there's a root, it's acyclic, and every node is reachable from the root.) Binary search trees are a way to store a update a set of items, where there is an ordering on the items. I know this is rather vague. But there is not a precise way to define the gamut of applicatons of search trees. In general, there are two classes of applications. Those where each item has a key value from a totally ordered universe, and those where the tree is used as an efficient way to represent an ordered list of items. Some applications of binary search trees: Storing a set of names, and being able to lookup based on a prefix of the name. (Used in internet routers.) Storing a path in a graph, and being able to reverse any subsection of the path in O(log n) time. (Useful in travelling salesman problems). Being able to quickly determine the rank of a given node in a set. A _symmetric order_ traversal of the tree is defined recursively as follows: symmetric_order(r) { if r = NIL then return symmetric_order(left(r)) print r symmetric_order(right(r)) } Commonly, the item stored at a node is called a key. The keys are commonly stored in the nodes such that a symmetric order traversal of the tree visits the keys in increasing order. In this case the following recursive algorithm can search for key k in a tree with root r. search(r, k) { if r = NIL then return NIL if key(r) = k then return r if key(r) < k return search(left(r),k) return search(right(r),k) } The generic insertion algorithm works as follows. To insert a key k, first apply the search algorithm for key k. Assuming k is not in the tree then eventually a NIL pointer is reached. This nil pointer is modified to point at a new node containing the key inserted. The generic deletion algorithm is a bit more complicated, and we won't describe it here. Given a set of n keys to store in a tree, there are many different shaped trees that can be used to store those keys. If the tree is _balanced_ then all the nodes will have a depth (distance from the root) of O(log n). Thus, one goal of the binary tree data structure is to allow all the operations to be carried out in O(log n) time. Unfortunately, the generic insertion and search procedures do not do this. For example, if the keys are inserted in sorted order, then tree is very unbalanced, and searching ends up taking Omega(n) time instead of O(log n). A number of different solutions have been devised to deal with this problem. For example: AVL Trees, red-black trees, self-adjusting trees (splay trees) and treaps. All of these methods make use of a local restructuring operation called a rotation. A rotation in a tree is a local restructuring that preserves the symmetric order. Here's a general picture: x y / \ / \ A y left rotation --> x C / \ <-- right rotation / \ B C A B Here the upper case letters represent arbitrary (possibly empty) subtrees. Here's a specific example: 1 2 / \ left rotation at 1 / \ 0 2 -------------------> 1 4 \ / / \ 4 0 3 5 / \ 3 5 Treaps are a randomized algorithm that ensures that at all times the tree is structured "randomly", according to an appropriate definition of random. lect0118-treaps describes and analyzes treaps.