Homework Assignment 6

Amortized Array-Based Dictionary

One of the most important structures in computer science are the *dictionary* data structures that support fast insert and lookup operations. Dozens of different data structures have been proposed for implementing dictionaries including hash tables, skip lists, search trees and others.

In this lab assignment you will implement a dictionary based on linked lists and sorted arrays. This structure combines a fast lookup on sorted arrays with ease of linked list-insertion. Note that a sorted array is good for lookups (think of a binary search) but bad for inserts. A linked list is good for inserts but bad for lookups (they can take linear time).

The idea of this data structure is as follows. We will have a linked list of arrays, where array k has size 2^{k}, and each array is in sorted order. However, there will be no relationship between the items in different arrays. The issue of which arrays to be used is based on the binary representation of the number of items we are storing. For example, if we have 11 items to store, then the arrays of sizes 1, 2, and 8 will be used, since 11 = 1 + 2 + 8, and our dictionary might look like this:

This data structure has interesting amortized and average-case performance, hence the name "Amortized Array-Based Dictionary" (AAD).

- Synthesize what we've covered so far in the course
- Problem solving, problem solving, problem solving
- Analyze a complex data structure
- Practice working in small teams to solve problems, design algorithms and write code

You are going to implement the dictionary data structure

public class Dictionary<E extends Comparable<E>> implements DictionaryInterface<E> { private int size; private Node head; ... }

using a linked list of sorted arrays. Specifically, you'll maintain a linked list of the following `Nodes`

:

private static class Node { private int power; private Comparable[] array; private Node next; ... }

These nodes are organized in the ascending order of array sizes - there are no two nodes containing arrays of the same size.

Since this is a pretty advanced assignment, you have been given a lot of starter code to help you out. Don't be intimidated by the starter code! If you understand the theory behind the AAD data structure and follow the directions in the starter code, the code you write will be rather short.

Your main goal is to implement the Dictionary interface. In addiiton you will need to implement the following helper methods. Their specifications are detailed in the starter code.

- binarySearch(Comparable[] a, Comparable item)
- frequency(Comparable[] a, Comparable item)
- merge(Comparable[] a, Comparable[] b)
- mergeDown()
- splitUp(Comparable[] a, int k)

If you get stuck, or don't understand something, re-read the lecture notes. Ask for help if you need it!

You create an array of size one and insert it into the linked list. Next, you traverse the list and merge sorted arrays of the same size until at most one array remains of each size.

Combining two dicionaries is based on recursive merging sorted arrays of the same size. The procedure has two phases. In the first phase you merge the roots of two linked lists into one linked list that is sorted by array size in monotonically increasing order. In the second phase you merge arrays of equal size until at most one array remains of each size.

Since each array in the list is in sorted order, you run Java's binary search on each of them.

This is the most cumbersome operation which can be implemented in more than one way. We advise you to follow the algorithm outlined by Nathan in his PPT presentation. In general terms, removing an item from an array of size 2^{k} results in splitting an array into k smaller arrays of sizes 1, 2, 4, and so on until 2^{k-1}. Then, you traverse the list and merge arrays of the same size.

As a consequence of some of the observations made in lecture, testing should be straightforward. Given a specific sequence of operations on an AAD, the resulting AAD is unique, so you can test by simply seeing if it is correct (the already-implemented toString() method should be really helpful for doing this).

That being said, here is a TestingDriver.java a class to help test your implementation. It will run some simple tests on your code and provide feedback.

The TAs will discuss testing in more detail in recitation on Wednesday Nov. 11.

You FTP the following java files

- Dictionary.java

Do not submit anything else like zip files, folders, desktops, class files. Failure to do so will result in a 10 point penalty.

Here is the point breakdown:

- binarySearch helper function - 2 points
- frequency helper function - 3 points
- merge helper function - 5 points
- splitUp helper function - 10 points
- add(e) - 10 points
- remove(e) - 10 points
- contains(e) - 10 points
- size() - 5 points
- frequency(e) - 5 points
- combine - 10 points
- Overal correctness of your Dictionary implementation - 10 points
- General coding style - 20 points.

Style will be graded very strictly for this lab assignment. Make sure you listen to the comments in the starter code (they are there to help you!) and your methods satisfy their specifications. Especially make sure you understand the AAD data structure as explained in lecture so that your implementation is as efficient as it should be.

Victor S.Adamchik, CMU, 2009