Sample Code for This Chapter

An abstract data type (ADT) is a type equipped with a set of operations for manipulating values of that type.  An ADT is implemented by providing a representation type for the values of the ADT and an implementation for the operations defined on values of the representation type.  What makes an ADT abstract is that the representation type is hidden from clients of the ADT.  Consequently, the only operations that may be performed on a value of the ADT are the given ones.  This ensures that the representation may be changed without affecting the behavior of the client --- since the representation is hidden from it, the client cannot depend on it.   This also facilitates the implementation of efficient data structures by imposing a condition, called a representation invariant, on the representation that is preserved by the operations of the type.  Each operation that takes a value of the ADT as argument may assume that the representation invariant holds.  In compensation each operation that yields a value of the ADT as result must guarantee that the representation invariant holds of it.  If the operations of the ADT preserve the representation invariant, then it must truly be invariant --- no other code in the system could possibly disrupt it.  Put another way, any violation of the representation invariant may be localized to the implementation of one of the operations.   This significantly reduces the time required to find an error in a program.

To make these ideas concrete we will consider the abstract data type of dictionaries.   A dictionary is a mapping from keys to values.  For simplicity we take keys to be strings, but it is possible to define a dictionary for any ordered type; the values associated with keys are completely arbitrary.  Viewed as an ADT, a dictionary is a type 'a dict of dictionaries mapping strings to values of type 'a together with empty, insert, and lookup operations that create a new dictionary, insert a value with a given key, and retrieve the value associated with a key (if any).  In short a dictionary is an implementation of the following signature:

signature DICT = sig
  type key = string
  type 'a entry = key * 'a
  type 'a dict
  exception Lookup of key
  val empty : 'a dict
  val insert : 'a dict * 'a entry -> 'a dict
  val lookup : 'a dict * key -> 'a
end

Notice that the type 'a dict is not specified in the signature, whereas the types key and 'a entry are defined to be string and string * 'a, respectively.

A simple implementation of a dictionary is a binary search tree.  A binary search tree is a binary tree with values of an ordered type at the nodes arranged in such a way that for every node in the tree, the value at that node is greater than the value at any node in the left child of that node, and smaller than the value at any node in the right child.  It follows immediately that no two nodes in a binary search tree are labelled with the same value.  The binary search tree property is an example of a representation invariant on an underlying data structure.  The underlying structure is a binary tree with values at the nodes; the representation invariant isolates a set of structures satisfying some additional, more stringent, conditions.

We may use a binary search tree to implement a dictionary as follows:

structure BinarySearchTree :> DICT = struct
  type key = string
  type 'a entry = key * 'a

  (* Rep invariant: 'a tree is a binary search tree *)
  datatype 'a tree = Empty | Node of 'a tree * 'a entry * 'a tree
  type 'a dict = 'a tree

  exception Lookup of key

  val empty = Empty

  fun insert (Empty, entry) = Node (Empty, entry, Empty)
    | insert (n as Node (l, e as (k,_), r), e' as (k',_)) =
      (case String.compare (k', k)
         of LESS => Node (insert (l, e'), e, r)
          | GREATER => Node (l, e, insert (r, e'))
          | EQUAL => n)

  fun lookup (Empty, k) = raise (Lookup k)
    | lookup (Node (l, (k, v), r), k') =
      (case String.compare (k', k)
         of EQUAL => v
          | LESS => lookup (l, k')
          | GREATER => lookup (r, k'))

end

Notice that empty is defined to be a valid binary search tree, that insert yields a binary search tree if its argument is one, and that lookup relies on its argument being a binary search tree (if not, it might fail to find a key that in fact occurs in the tree!).  The structure BinarySearchTree is sealed with the signature DICT to ensure that the representation type is held abstract.

The difficulty with binary search trees is that they may become unbalanced.  In particular if we insert keys in ascending order, the representation is essentially just a list!  The left child of each node is empty; the right child is the rest of the dictionary.  Consequently, it takes O(n) time in the worse case to perform a lookup on a dictionary containing nelements.  Such a tree is said to be unbalanced because the children of a node have widely varying heights.  Were it to be the case that the children of every node had roughly equal height, then the lookup would take O(lg n) time, a considerable improvement.

Can we do better?  Many approaches have been suggested.  One that we will consider here is an instance of what is called a self-adjusting tree, called a red-black tree (the reason for the name will be apparent shortly).  The general idea of a self-adjusting tree is that operations on the tree may cause a reorganization of its structure to ensure that some invariant is maintained.  In our case we will arrange things so that the tree is self-balancing, meaning that the children of any node have roughly the same height.  As we just remarked, this ensures that lookup is efficient.

How is this achieved?  By imposing a clever representation invariant on the binary search tree, called the red-black tree condition.  A red-black tree is a binary search tree in which every node is colored either red or black (with the empty tree being regarded as black) and such that the following properties hold:

  1. The children of a red node are black.
  2. For any node in the tree, the number of black nodes on any two paths from that node to a leaf is the same.  This number is called the black height of the node.

These two conditions ensure that a red-black tree is a balanced binary search tree.   Here's why.  First, observe that a red-black tree of black height h has at least 2h-1 nodes.  We may prove this by induction on the structure of the red-black tree.  The empty tree has black-height 1 (since we consider it to be black), which is at least 21-1, as required.   Suppose we have a red node.  The black height of both children must be h, hence each has at most 2h-1 nodes, yielding a total of 2(2h-1)+1 = 2h+1-1 nodes, which is at least 2h-1.  If, on the other hand, we have a black node, then the black height of both children is h-1, and each have at most 2h-1-1 nodes, for a total of 2(2h-1-1)+1 = 2h-1 nodes.   Now, observe that a red-black tree of height h with n nodes has black height at least h/2, and hence has at least 2h/2-1 nodes.  Consequently, lg(n+1)>=h/2, so h <= 2lg(n+1).   In other words, its height is logarithmic in the number of nodes, which implies that the tree is height balanced.

To ensure logarithmic behavior, all we have to do is to maintain the red-black invariant.  The empty tree is a red-black tree, so the only question is how to perform an insert operation.  First, we insert the entry as usual for a binary search tree, with the fresh node starting out colored red.  In doing so we do not disturb the black height condition, but we might introduce a red-red violation, a situation in which a red node has a red child.  We then remove the red-red violation by propagating it upwards towards the root by a constant-time transformation on the tree (one of several possibilities, which we'll discuss shortly).  These transformations either eliminate the red-red violation outright, or, in logarithmic time, push the violation to the root where it is neatly resolved by recoloring the root black (which preserves the black-height invariant!).

The violation is propagated upwards by one of four rotations.  We will maintain the invariant that there is at most one red-red violation in the tree.  The insertion may or may not create such a violation, and each propagation step will preserve this invariant.  It follows that the parent of a red-red violation must be black.   Consequently, the situation must look like this.   This diagram represents four distinct situations, according to whether the uppermost red node is a left or right child of the black node, and whether the red child of the red node is itself a left or right child.  In each case the red-red violation is propagated upwards by transforming it to look like this.   Notice that by making the uppermost node red we may be introducing a red-red violation further up the tree (since the black node's parent might have been red), and that we are preserving the black-height invariant since the great-grand-children of the black node in the original situation will appear as children of the two black nodes in the re-organized situation.  Notice as well that the binary search tree conditions are also preserved by this transformation.  As a limiting case if the red-red violation is propagated to the root of the entire tree, we re-color the root black, which preserves the black-height condition, and we are done re-balancing the tree.

Let's look in detail at two of the four cases of removing a red-red violation, those in which the uppermost red node is the left child of the black node; the other two cases are handled symmetrically.  If the situation looks like this, we reorganize the tree to look like this.  You should check that the black-height and binary search tree invariants are preserved by this transformation.   Similarly, if the situation looks like this, then we reorganize the tree to look like this (precisely as before).  Once again, the black-height and binary search tree invariants are preserved by this transformation, and the red-red violation is pushed further up the tree.

Here is the ML code to implement dictionaries using a red-black tree.  Notice that the tree rotations are neatly expressed using pattern matching.

structure RedBlackTree :> DICT = struct
  type key = string
  type 'a entry = string * 'a

  (* Representation invariant: binary search tree + red-black conditions *)
  datatype 'a dict = Empty
  | Red of 'a entry * 'a dict * 'a dict
  | Black of 'a entry * 'a dict * 'a dict

  val empty = Empty

  exception Lookup of key

  fun lookup dict key =
      let
          fun lk (Empty) = raise (Lookup key)
            | lk (Red tree) = lk' tree
            | lk (Black tree) = lk' tree
          and lk' ((key1, datum1), left, right) =
              (case String.compare(key,key1)
                 of EQUAL => datum1
                  | LESS => lk left
                  | GREATER => lk right)
      in
         lk dict
      end

  fun restoreLeft (Black (z, Red (y, Red (x, d1, d2), d3), d4)) =
      Red (y, Black (x, d1, d2), Black (z, d3, d4))
    | restoreLeft (Black (z, Red (x, d1, Red (y, d2, d3)), d4)) =
      Red (y, Black (x, d1, d2), Black (z, d3, d4))
    | restoreLeft dict = dict
 
  fun restoreRight (Black (x, d1, Red (y, d2, Red (z, d3, d4)))) =
      Red (y, Black (x, d1, d2), Black (z, d3, d4))
    | restoreRight (Black (x, d1, Red (z, Red (y, d2, d3), d4))) =
      Red (y, Black (x, d1, d2), Black (z, d3, d4))
    | restoreRight dict = dict
 
  fun insert (dict, entry as (key, datum)) =
      let
          (* val ins : 'a dict -> 'a dict insert entry *)
          (* ins (Red _) may violate color invariant at root *)
          (* ins (Black _) or ins (Empty) will be red/black tree *)
          (* ins preserves black height *)
          fun ins (Empty) = Red (entry, Empty, Empty)
            | ins (Red (entry1 as (key1, datum1), left, right)) =
              (case String.compare (key, key1)
                 of EQUAL => Red (entry, left, right)
                  | LESS => Red (entry1, ins left, right)
                  | GREATER => Red (entry1, left, ins right))
            | ins (Black (entry1 as (key1, datum1), left, right)) =
              (case String.compare (key, key1)
                 of EQUAL => Black (entry, left, right)
                  | LESS => restoreLeft (Black (entry1, ins left, right))
                  | GREATER => restoreRight (Black (entry1, left, ins right)))
      in
         case ins dict
           of Red (t as (_, Red _, _)) => Black t (* re-color *)
            | Red (t as (_, _, Red _)) => Black t (* re-color *)
            | dict => dict
      end

end

It is worthwhile to contemplate the role played by the red-black invariant in ensuring the correctness of the implementation and the time complexity of the operations.

Sample Code for This Chapter