15-212-ML : Homework Assignment 3

Due Wed Oct. 7, 12 noon (electronically); papers at recitation.

Maximum Points: 100 (+30 extra credit) 


Problem 1: Dictionaries (35 points + 10 extra credit)

Binary trees are a particularly useful and versatile data structure. As described in the lecture notes, a binary search tree  is a binary tree with values of an ordered type at the nodes arranged in such a way that for every node in the tree, the value at that node is greater than the value at any node in the left child of that node, and smaller than the value at any node in the right child.  Thus an inorder traversal of the tree will yield an enumeration of the values in the tree from least to greatest. This representation invariant makes makes binary search trees efficient structures to search, taking time proportional to lg(n) to find element in a balanced tree, where n is the number of elements in the tree.

However, without taking special care, trees that are built from arbitrary insertions of data are not necessarily balanced, meaning that searches may take time O(n) in the worst case. There are many mechanisms for building balanced binary search trees. One such mechanism, red-black trees, is described in the lecture notes. For problem 1 you will implement balanced binary trees using a balancing criterion based upon tree height.

We define the height, h(t), of trees inductively on the structure of the tree:

h(empty) = 0
h(t) = 1 + max(h(t.left), h(t.right))
where t.left and t.right are the left and right children of t, empty is the empty tree, and max is the function which returns the greater of its two arguments. In other words h(t) is zero if the tree is empty, and the one more than the height of the largest child otherwise.

The idea is to store at each node the height of that node in the tree. Then, when new trees are constructed from old trees, the new tree can be balanced by means of one of the rotations described below. In addition to the binary search tree invariant described above, you must maintain the following invariant at each node with height greater than 2:

  h(l) <= h(r) * w  and  h(l) * w >= h(r)

where w is a constant called the weight ratio, l is the left child, and r is the height of the right child. In other words, at every node, neither child is more than w times the height of the other.

There are four rotations that you will use to keep trees balanced: single left, single right, double left, and double right. The two left rotations are presented pictorially below; the right rotations are just the mirror images of these.

A single rotation lifts Z relative to X and Y, while a double rotation lifts Y. Therefore, when balancing trees, you should use a double rotation if Y is taller, or a single rotation if Z is taller.

Question 1.1 (35 points)

Write a structure Dict with the basic types described in the DICT signature (5 points) and the empty, lookup, insert (20 points), and fold (10 points) values.

signature DICT =
  type key = string
  type 'a entry = key * 'a
  type 'a dict

  val empty : 'a dict
  val lookup : 'a dict * key -> 'a option
  val insert : 'a dict * 'a entry -> 'a dict
  val fold : ('a * 'b * 'b -> 'b) -> 'b -> 'a dict -> 'b

(*val delete : 'a dict * key -> 'a dict (* extra credit *) *)
end;  (* signature DICT *)

Use w = 3 and use the following datatype as the representation type of an 'a dict.

datatype 'a dict =
  | T of 'a entry * int * 'a dict * 'a dict; (* data * height * left child * right child *)

(* Invariants (writing height(t) for the height of tree t):
   1. t : s dict is a binary search tree
   2. For every tree t = T (e, h, left, right), h = height(t)
   3. For every tree t = T (e, h, left, right),
      height(left) <= w*height(right) and height(right) <= w*height(left)

NOTE: Make sure to annotate all your functions with the invariants they expect for the arguments!  Otherwise the graders may not be able to understand and assess the correctness of your implementation!

Hint: Your code will be much easier to understand if you use a helper function balance (not visible in the signature!) which takes an entry a and two balanced trees, X and Y, whose heights are not too much out of balance (you need to decide how much that is, it will be an invariant of the function), and constructs the balanced tree derived from the tree

using one of the rotations, if necessary. Again, be sure to state all of the invariants balance expects from its arguments.

 Question 1.3 (Extra Credit, 10 points)

Write the function delete which takes a key and a tree and returns a balanced version of the tree created by removing the entry with the given key from the tree. No exception should be raised if the key is not in the tree.

Problem 2: An Address Book (30 points)

For this problem you will write a simple address book utility using the dictionary structure implemented in problem 1. The purpose is to become familiar with programming using a strict interface. You will need to know how to use record types for this exercise. Read pp. 32-35 in Paulson if you don't already know about records.

The following signature ADDRESSBOOK describes address books:
signature ADDRESSBOOK =
        type key = string
        type entry =
            {Key : key,
             LastName : string,
             FirstName : string,
             Nickname : string option,
             Email : string,
             OfficeBuilding : string,
             OfficeNumber : int}
        type address_book

        val empty : address_book
        val addEntry : address_book * entry -> address_book
        val lookup : address_book * key -> entry option

        exception NotFound of key
        val updateNickName : address_book * key * string -> address_book
        val updateEmail : address_book * key * string -> address_book
        val updateOfficeBuilding : address_book * key * string -> address_book
        val updateOfficeNumber : address_book * key * int -> address_book
       (* question 2.2 & 2.3 *)
        val getRange address_book * key * key -> entry list

The address book will store entry records with information about the people listed in the address book.  The entries should be stored with the concatenation of the last name and the first name as keys. For instance the entry with LastName = "Wickline" and FirstName = "Philip" should have Key = "WicklinePhilip", and should be stored with key "WicklinePhilip" in the collection of entries.

Question 2.1 (10 points)

Implement a structure AddressBook with the types described in ADDRESSBOOK  and the empty, addEntry, lookup, and various updateFoo functions.

Question 2.2 (10 points)

Write the function getRange which when given an address book and a pair of keys, returns the list of all entries in the book whose keys are alphabetically between the two keys, inclusive. For example, if the address book book has entries with keys "WatkinSarah", "SmithJose", and "SmithJoseph", the call getRange(book, "SmithJose", "Taylor") should return the entries associated with the keys "SmithJose" and "SmithJoseph", in no particular order. What is the time complexity of this function? Why is it less efficient than a simple lookup?

Question 2.3 (10 points)

Extend the signature and implementation of dictionaries to add facilities to make the getRange function more efficient.  Call the the new signature DICT1 and the  structure Dict1. Modify your address book structure to use this new dictionary structure, and call the new address book structure AddressBook1.

Problem 3: Regular expressions (35 points + 20 extra credit)

In class five different operators have been introduced to describe regular expressions R: This has been implemented as the following datatype.
datatype regexp =
    Char of char
  | Times of regexp * regexp
  | One
  | Plus of regexp * regexp
  | Zero
  | Star of regexp;
From time to time it is helpful to have some more constructs available to form regular expressions, such as The regular expression matcher accept is available in the file regexp.sml and should be extended to deal with these new constructors. Remind yourself of the specification of the acc function and make sure your new cases fit this specification. You should also generate some test data to validate your implementation. Note that for the purposes of our implementation, the alphabet is simply any ML value of type char.

Question 3.1 (5 points)

Implement the case for the one-character wildcard _.

Question 3.2 (15 points)

Implement the case for intersection r1 & r2.

Question 3.3 (15 points)

Implement the case for T

Question 3.4 (Extra credit, 10 points)

Prove the correctness of the case for intersection in your implementation, following the pattern in the handout.

Question 3.5 (Extra credit, 10 points)

Implement the case for Negation.  

Handin instructions