Introducing Trees

Advanced Programming/Practicum
15-200


Introduction In this lecture we will continue our study of self-referential classes by examining trees. Like linked lists, trees contain nodes: these nodes are objects instantiated from a class that contains instance variables that refer to other nodes from this same class. Whereas references in the linked list class indciate a "follows" relationship (and in the case of doubly-linked lists also a "precedes" relationship), references in tree classes indicate an inclusion relationship (where a parent node includes all its children nodes): these relationships are much more interesting in terms of the kinds of information that they can represent.

Although we will first examine general tree structures, we will focus most of our attention in this lecture and the next on defining and processing binary trees. Within this category we will soon see examples of ordered (search) trees and structure (expression) trees. We will use ordered search trees primarily to store collections of values that can be searched quickly (bringing O(Log2N) searching to self-referential classess, just as we did for arrays).

In the final tree lecture we will examine another kind of ordered tree (a heap) and it relation to implementing a priority queue with "fast" enqueue/dequeue operations, as well as other special kinds of trees (N-ary trees, structure/expression trees, and digital trees). Again, in 15-200 we are just scratching the surface of the topic of trees. 15-211 provides a more extensive study of this topic, which is very important in Computer Science.


Terminology All kinds of trees illustate one important relationship: inclusion between parts and a whole; another way to describe this relationship is that between a parent node that includes child nodes. Every child node has a unique parent; every parent node can have any number of children (including none). As in trees used in geneology, we will write each parent node directly above its child(ren) node(s). In fact, we will use other geneological terms, like ancestor and descendant, when describing nodes in a tree. We draw lines between parent/child nodes to illustrate their direct relationship.

There is one unique node in every tree: this node has no parent and is called the root of the tree; because all other nodes in the tree are its descendants, we write the root node at the top of the tree.

A mutually exclusive way to classify tree nodes is as internal or leaf. An intenal node has one or more children; a leaf node has no children. So, any node that is a parent is an internal node; a node that is only a child (not a parent to another child) is a leaf node.

Finally, we define the size of a tree as the number of nodes that it contains (similarly to the length of a linear linked list); we define the height of a tree as the length of the longest path (each line counts as one step) from a root to one of its descendants. Alternatively, we can define the depth of a node as the number of ancestors it has, and then define the height of a tree as the largest depth of any of its nodes. Note that the root is at depth 0, because it has no ancestors; a tree consisting solely of a root also has a height of 0. The concepts of size and height for trees generalize the length of linear linked list.

We have already used trees to represent inheritance hierarchies: the relationship between classes (parents) and subclasses (children). In the bouncing ball program, we used the following tree to illustrate the inheritance hierarchy of most of its model classes.

  Let's state some facts about this tree using some of the terminology defined above.
  • The root of the tree is labelled Object (it is also an internal node).
  • The node labelled Simulton is an internal node that has two children: the nodes labelled BlackHole, MoveableSimulton; of course, the parent of each of these nodes is Simulton.
  • The nodes labelled PulsatingBlackHole, Ball, Floater, and HuntingBlackHole are leaf nodes.
  • The ancestors of the node labelled Ball are the nodes labelled Pre (its parent), MoveableSimulaton (its grandparent), Simulton (its great-grandparent), and Object (its great-great-grandparent).
  • The size of this tree is 9 nodes; the height is 4 (both Ball and Floater are at depth 4 in the tree.
Another common example of relationships that can be represented by a tree is the structure of a file directory. The root node in a file directory is a folder that is the root of the directory. Each of its children is either a file (which must be a leaf node) or a folder (which can itself act as a parent to other children that are files or folders). When we study N-ary trees (which file directories are examples of) we will examine recursive methods to compute information like the amount of storage occupied by all the files in a directory.

A Class for Defining Binary Trees In this section we will begin our detailed study of trees by examining binary trees. A binary tree has at most two children (each node has 0 children -a leaf node- or has 1 or 2 children -an internal node). We can define a class to construct objects/nodes for such trees as
  public class TN {
    public int value;
    public TN  left,right;

    public TN (int i, TN l, TN r)
    {value = i; left = l; right = r;}
  }
In the standard definition of a binary tree, a parent node refers to each of its two (left and right) subtrees (which can be null or refer to child nodes that themselves are trees). Of course, the null reference denotes an "empty" tree (one with no nodes), just as it denotes and empty list.

As in doubly-linked lists, we can extend such a class to also include a "parent" reference instance variable. But, such references are often not worth the trouble to implement and maintain, and we will do without them (just as we did without "previous" references in our study of linked lists.

In classes that implement collections via trees, we typically declare an instance variable named root that stores null or a reference to the root of a tree (and use it just as we used front when storing collections in a linked list).


Recursive Methods for Computing Size and Height In this section we start relating some terminological concepts that we learned to recursive methods that operate on the binary trees defined in the previous section. We can write a very simple recursive method for computing the size of a binary tree; it is simlar to (and generalizes) the recursive method that we studied to compute the length of linked list.
  public int size (TN t)
  {
    if (t == null)
      return 0;
    else
      return 1 + size(t.left) + size(t.right);
  }
Note that here (and in many other recursive methods operating on binary trees) we write two recursive calls: one to compute the size of the left subtree and one to compute the size of the right subtree.

We can prove that this method is correct as follows.

  • For the base case (an empty tree) this method returns the correct size: 0 nodes.
  • The recursive calls are applied to a strictly smaller trees (at least one fewer nodes and of at least one smaller height: both integers that characterize the size of a tree/problem).
  • If size(t.left) and size(t.right) correctly compute the number of nodes in the left and right subtrees of t, then returning a value one bigger (for the root of this subtree) than the sum of these values correctly computes the size of the entire list.
Note that without some kind of array or collection class, we CANNOT write this code iteratively. If we try to use one cursor (as opposed to an array or stack of cursors) once we move to one subtree (say the left one) we have lost our reference to the other (right) one. As always, it would be useful to hand simulate this recursive method on a small tree to understand its workings better. In a hand simulation, calls would go up and down the call frames, unlike linear (linked list) recursion, which tends to go down once and then back to the top.

Here is a iterative method that uses a stack to compute the size of a tree

  public int size (TN t)
  {
    AbstractStack s    = new ArrayStack();
    int           size = 0;
    s.add(t);
    while (!s.isEmpty()) {
      TN next = (TN)s.remove();
      if (next != null) {
        size++;
        s.add(next.left);
        s.add(next.right);
      }
    }

    return size;
  }

We can also write a recursive method to compute the height of tree. First, we will do so in an intuitive manner; then we will write a smaller and simpler to understand method using a bit more sophistication.

Note that height of a (sub)tree that is a leaf node is just 0. Also note that the height of an internal node is 1 more than the biggest height of its subtrees. Using these facts we can write the following recursive method to compute the height of any non-empty tree.

  public int height (TN t)
  {
    if (t.left == null && t.right == null) //leaf check
      return 0;
    else if (t.left == null)
      return 1 + height(t.right);
    else if (t.right == null)
      return 1 + height(t.left);
    else
      return 1 + Math.max(height(t.left),height(t.right));
  }
This method deals with all the necessary cases: a leaf node, an internal node with only a left (or only a right) subtree, and an internal node with both left and right subtrees. This method does not work on empty trees, which have no directly defined height from the previous definition.

Now, let us simplify this code by defining the height of an empty tree to be -1. In one case this seems very strange, but in another it seems obvious: an empty tree should have a height that is one less than a leaf node (whose height is 0). By using this definition (and no others), we can simplify the height method (as well as defininig it for all possible trees, even empty ones) into the elegant method below.

  public int height (TN t)
  {
    if (t == null)
      return -1;
    else
      return 1 + Math.max(height(t.left),height(t.right));
  }
Again, if t is a leaf node, then its left and right subtrees are empty, so this method would preform the recursion and return 1 + Math.max(-1,-1) which returns 0 (the correct answer for a leaf node). So, using this generalization of height, our code is simpler and always works (no matter whether an empty or non-empty tree is passed as a parameter; in the earlier method, passing an empty tree has a parameter would cause Java to throw a NullPointerException when it tried to determine if the node was a leaf).

Mathematicians generalize definitions such as this one all the time. You may or may not know that for a non-zero a, a0 is defined as 1. There are many ways to justify this definition (some quite complicated); the simplest way is to note the algebraic law axay = ax+y. By this law (a quite useful one to have) a0ax = a0+x = ax; which means that a0 must be equal to 1 for this identity to hold.


Mathematics Size/Height Relationships We can use the structure of binary trees to derive some mathematical relationships between their sizes and heights. First, we should reiterate that the "inclusion" relationships modeled by trees is much more interesting than the "follows" relationship that is modeled by linear linked lists. One way to illustrate the difference in "interestingness" is by examining all structurally different (different looking) linked lists containing 4 nodes, independent of the values they store: there is only one.
  In contrast, here is a listing of all the structural different binary trees containing 4 nodes (i.e., of size 4)
  In a more mathematically advanced class, we could deduce a formula that computes the number of structurally different trees containing N nodes (this is similar to computing the number of isomers of a chemical molecule).

We define a pathological tree as one with only one node at each depth (all the ones on the bottom). In all pathological trees, we have height = size-1.

At the other end of the spectrum is a perfect tree, in which every depth is filled with as many nodes as possible (none of the trees above satisfy this criteria). The picture below shows perfect trees of height 0, 1, 2, and 3.

  If we tabulate this data, we have the following information characterizing the height and size of perfect trees.
heightsize
01
13
27
315

If we study and extend this table, we can guess a simple but interesting relationship between the height of a perfect tree and its size: size = 2height+1-1. First, verify that this formula is correct for the heights/sizes shown. Now, let's prove it by induction.

  1. For a perfect tree of height 0, the formula is true (by evaluation).
  2. Lets's assume that this formula is true for all perfect trees of height less than or equal to h, and prove that it is true for a tree of height h+1. To construct a perfect tree of height h+1 examine the following picture. Then the number of nodes in the entire perfect tree is
    1 + 2h+1-1 + 2h+1-1 = 2(h+1)+1-1
    Which completes the proof for perfect tree of height h+1.

Rewriting this equality to express height as a function of size, we have, height = Log2(size+1) - 1.

Now, we can also write the original formula as size = 2(2height)-1; removing the multiplicative and additive constants, we have size is O(2height). Or, solving for height, we have height is O(Log2size). In the next lecture we will learn that the complexity class for searching an ordered binary tree is related to its height; for perfect trees the complexity class is O(Log2size). If we can keep our binary trees reasonably full, we will be able to search them in the same complexity class as searching sorted arrays (same for adding and removing elements -which was not true for ordered lists-, while keeping the ordered property).


Problem Set To ensure that you understand all the material in this lecture, please solve the the announced problems after you read the lecture.

If you get stumped on any problem, go back and read the relevant part of the lecture. If you still have questions, please get help from the Instructor, a CA, or any other student.

  1. Hand simulate the size and height methods discussed in this lecture on empty tress and various small non-empty trees.

  2. Write the method print that prints all values in a binary tree.

  3. Write the method max that computes the maximum values stored in a binary tree (assume the tree is a non-empty tree).