Carnegie Mellon University Website Home Page
 
Fall 2014

15-121 Homework 6: BSTs and Anagrams - Due 11/19 at midnight

Download and unzip hw6-code.zip, which contains all the files you will need for this assignment. You will be writing code in a number of files, including creating one class from the ground up. The goals of this assignment are:

  • to give you additional practice with recursion and BST operations
  • to modify the existing BST class in an interesting way
  • to write a class from scratch
  • to build a class you might actually use in word games

Note: You will be graded in part on your coding style. Your code should be easy to read, well organized, and concise. You should avoid duplicate code.


Background: The Assignment

This assignment has two parts. The first is to write (and test) three Binary Search Tree methods: height, isBalanced, and mirrorTree. These will be added to the BST.java file. In addition, you're going to write an anagram generator as was demo'd in class. In order to do this, you're going to have to augment the same BST class to enable it to handle values with duplicate keys as well as write a class called AnagramTree that constructs a tree of words that can be searched for anagrams.


Part I — The BST Methods

There are three new methods that you will be writing for the BST class in this homework assignment. You should test your methods by calling them from the main method in the BST class. You will only receive credit for a method if it works completely correctly, so test thoroughly!

You are going to need recursion and helper methods to do this homework!! :-)

The three new BST methods are as follows:

    public int height()
    public boolean isBalanced()
    BST<AnyType> mirrorTree()
Specifications for what each method is supposed to do are given below. Again, you are strongly encouraged to test your BST methods before moving on to the second part of this homework.

int height()
Returns the length of the longest path (number of edges) from the root to a leaf. Recall that the height of a 1-node tree (just the root) is 0 (as is the height of an empty tree) and the height of a node is the length of the longest path from that node to a leaf.

boolean isBalanced()
Returns true if this BST is balanced, meaning that the height of the left subtree and the height of the right subtree of each node (not just the root) differ by no more than 1; returns false otherwise. An empty tree is (trivially) balanced.

BST<AnyType> mirrorTree()
Returns a new BST that is the mirror of the original, i.e., all the nodes that were initially in the left subtree of a node in the original tree are now in the right subtree of that node in the mirrorTree. Note that this effectively "flips" the sort order of the BST since all the smaller nodes (which were on the left) will now be on the right. So the mirrorTree will be sorted in ascending order (which you can test by doing an inorder traversal of the mirrorTree).


Part II — The Anagram Generator

You're going to write an anagram generator, AnagramTree, as was demo'd in class. The AnagramTree will make use of the BST class to read in a file of words (one per line) and store them in a binary tree using their sorted letters as the search key. What does this mean? When you read a word from the file (a String), you must sort it (by creating another, sorted, String that has all the letters of the original word in sorted order) and then insert both the sorted word and the original word in the tree. The sorted word (a String) will be the search key for the binary search tree, and all the words that have the same sorted form (like "rats" and "tars" and "arts") will all be stored in the same node in a list, with key "arst"). The reason for doing this is that anagrams are words that have the same letters, just rearranged. So you'll take a word, put it in a standard or canonical form (that would be the sorted form), and any two words that have the same canonical form must be anagrams! Then all you have to do is print the list of words that have the same canonical form.

Process

BST.java

You will need to add two more methods (in addition to those specified in Part I) to BST.java as well as modify the private TreeNode class to both declare and create a list of AnyType to hold the words with the same canonical form. The two new methods will need to interact with this list. One is an overloaded add method that takes the sorted word (AnyType) and the original word (AnyType) as parameters and inserts the sorted word in the tree (as the key value) and the original word in the list (of AnyType) that belongs to that key. Of course, as you write the helper function for this overloaded version of add, you will have to handle the case where the key is already in the tree a little differently than was done in lecture since it is no longer an error!

The second method to add to BST.java is a find method that takes a sorted word (AnyType), determines if it is in the tree as a key and, if it is, returns the list (of AnyType, but really of Strings that are original words) that map to that key so you can print it out.

AnagramTree.java

Once you've got the BST modifications done, it's time to get to work creating anagrams. You can have your user interface behave any way you want (I will provide the output of my program below: your results have to match, but your program's interaction with the user does not have to mimic mine line-by-line).

You want to have all the words with the same letters be stored in the same location. So you will use the sorted form of a word as the key, and associate that key with all the words that have that same sorted form. Thus, "tar" will be equivalent to "tar, art, rat" (in any order).

As you can see from the driver code, you will build the tree by asking the user what file they want to read from and the maximum word size that they want to consider and then construct an AnagramTree by opening, reading, and inserting all the words from that file. You should only store the words that are less than or equal to the maximum length provided by the user. To help in debugging, I have provided you with two dictionary files: small-words-qatar.txt and words.txt. The files are quite differently sized: small-words-qatar.txt contains 25 words, while words.txt contains over 172,000 words! You should test your code on the small file before going to the big one!

The AnagramTree constructor takes a file name and maximum word size and builds a tree with all the words that are less than or equal to the length specified. To do this, you read a string and if it's the right length, construct its sorted form, and then add the sorted form and original word into your BST. If, when you're done reading the small file, you read 26 words, and with a maximum word size of 7, you inserted 16 of them with 9 nodes in the tree, you appear to be on the right track. If you then run on the big file and get 51913 words inserted and 41121 nodes with a max word size of 7 and 80314 words and 66538 nodes with a max of 8, you have built a correct tree.

Once you've got your tree built, the driver code now asks the user for a word and, if its length is less than or equal to your max length, searches for it in the tree (by calling findMatches, using what as the key?) and, if found, print all the anagrams of that word (which will be found in the list that the key returned as its value attribute). If the word is not found in the tree, you should tell the user that it was not found. The user should be allowed to search for as many individual words as they want until they enter a sentinel value, at which point the program should end. As an example, if you are using the small file and the user enters "tar", your program should print "tar rat art" (in any order) as your output. Once you can do that, you're done!

AnagramTester.java

One final note: do NOT make any changes to AnagramTester.java.

Expected Output

The following was the output of my reference solution, using 7 as the maximum word size on the large dictionary (recall that your words can appear in any order but there can be no duplicate words printed):

Enter name of dictionary file: words.txt
Max word length: 7

Total number of words read: 172715
Number of words inserted (length <= 7): 51913
Number of nodes in the tree: 41121

string to search [#] to stop: chart
  Words that match: [ratch, chart]

string to search [#] to stop: star
  Words that match: [tsar, rats, arts, star, tars]

string to search [#] to stop: hoser
  Words that match: [heros, hoers, shoer, shore, horse]

string to search [#] to stop: strand
  Words that match: [strand]

string to search [#] to stop: foon
  NO words match!

string to search [#] to stop: #

Submitting Your Work

When you have completed the assignment and tested your code thoroughly, create a .zip file with your work (including AnagramTree.java and BST.java. Name the zip file "your-andrew-id".zip and email it to me mjs @ cmu.edu.

Make sure to keep a copy of your work just in case!