\ Assignments - Homework 8 - 15-121 - Fall 2012
Carnegie Mellon University Website Home Page
 
Fall 2012

15-121 Homework 8: Duplicitous Hangman - Due 12/12 ;-) at midnight

Props to Keith Schwartz, who originally developed this assignment and presented it at SIGCSE 2011 as a Nifty Assignment.

Download and unzip hw8-code.zip, which contains the dictionary files you will need for this assignment. You are permitted to work with a partner and are also permitted to propose a different final project that explores the data structures that we have discussed this semester. The goals of this assignment are:

  • to wrap up our discussion of Java data structures
  • to gain more experience searching and using the Java API
  • to gain more experience in creating a solution to a problem with minimal guidance as to what methods to write and how to write them

Note: You will be graded in part on your coding style. Your code should be easy to read, well organized, and concise. You should avoid duplicate code.


Background: The Assignment

In this assignment, you will build a clever program that bends the rules of Hangman to trounce its human opponent time and time again. In doing so, you'll gain more experience with data structures, program design and iterative development, and will deepen your general programming savvy. Plus, you'll end up with a piece of software which should be highly entertaining. At least, from your perspective.

In case you aren't familiar with the game Hangman, the rules are as follows:

  1. One player chooses a secret word, then writes out a number of dashes equal to the word length.
  2. The other player begins guessing letters. Whenever the guessed letter is contained in the word, the first player reveals each instance of that letter in its correct position in the word. Otherwise, the guess is wrong and the player loses a guess.
  3. The game ends either when all the letters in the word have been revealed (i.e., the guesserhas determined the word) or when the guesser has run out of guesses.

Fundamental to the game is the fact the first player accurately represents the chosen word. That way, when the other player guesses a letter, it is correctly revealed in the word or correctly reported that it is not contained in the word. But what happens if the player doesn't do this? This gives the player who chooses the hidden word an enormous advantage. For example, suppose that you're the player trying to guess the word, and at some point you end up revealing letters until you arrive at this point with only one guess remaining:

                                        D O - B L E

There are only two words in the English language that match this pattern: "doable" and "double". If the player who chose the hidden word is playing fairly, then you have a fifty-fifty chance of winning this game if you guess 'A' or 'U' as the missing letter. However, if your opponent is cheating and hasn't actually committed to either word, then there is no possible way you can win this game. No matter what letter you guess, your opponent can claim that she had picked the other word and you will lose the game.


The Basic Algorithm

Here's a description of the algorithm involved in the sample run below. The user chooses a word length (make sure it is greater than 0 and less than or equal to the length of the largest word in the dictionary). The computer then picks a random word of the correct length. The game proceeds as follows:

  • the user guesses a letter
  • if the letter is not in the word, inform the user, they lose a guess and, if out of guesses, the game ends. If not out of guesses, the user guesses another letter and we try again.
  • if the letter is in the word, see if there is another word (or words) that fits the current pattern without the letter that was just guessed. If there is, decide (somehow) whether to congratulate the user on a successful guess and reveal the letter or (randomly) pick a new word without the letter, inform the user that the letter was not in the word (even though it might have been in the "original" word) and charge them an unsuccessful guess. As above, if they are out of guesses, the game ends. If not, the user guesses another letter and we go around again.


A Sample Run

When the game starts, the computer pick a word at random as the secret word, in this case the word curries has been chosen. The player has no misses and no letters of the secret word are shown. Since debug-printing is on, the computer displays the secret word and the number of possibilities for the secret word – in this case all 23,096 seven-letter words in the computer's dictionary are possible. In the output below the italicized lines would normally not be printed as part of the game.

Welcome to (Clever) Hangman:

Enter maximum word length: 7

# words possible:  23096
(secret word: curries)
Progress:  - - - - - - -
letters missed: 
guess a letter:  i
i is not in the secret word

# words possible:  4048
(secret word: tresses)
Progress:  - - - - - - -
letters missed:  i
guess a letter:  e
congratulations, e is in the word!

# words possible:  969
(secret word: waffles)
Progress:  - - - - - e -
letters missed:  i
guess a letter:  a
a is not in the secret word

# words possible:  455
(secret word: toppled)
Progress:  - - - - - e -
letters missed:  i a
guess a letter:  o
o is not in the secret word

# words possible:  159
(secret word: jumbled)
Progress:  - - - - - e -
letters missed:  i a o
guess a letter:  u
congratulations, u is in the word!

# words possible:  76
(secret word: rumbled)
Progress:  - u - - - e -
letters missed:  i a o
guess a letter:  r
r is not in the secret word

# words possible:  46
(secret word: tumbled)
Progress:  - u - - - e -
letters missed:  i a r o
guess a letter:  t
t is not in the secret word

# words possible:  41
(secret word: bundles)
Progress:  - u - - - e -
letters missed:  i a r t o
guess a letter:  s
s is not in the secret word

# words possible:  20
(secret word: buckled)
Progress:  - u - - - e -
letters missed:  a i o s r t
guess a letter:  d
congratulations, d is in the word!

# words possible:  15
(secret word: lunched)
Progress:  - u - - - e d
letters missed:  a i o s r t
guess a letter:  n
n is not in the secret word

# words possible:  9
(secret word: bumbled)
Progress:  - u - - - e d
letters missed:  a i o n s r t
guess a letter:  m
congratulations, m is in the word!

# words possible:  4
(secret word: jumbled)
Progress:  - u m - - e d
letters missed:  a i o n s r t
guess a letter:  b
congratulations, b is in the word!

# words possible:  3
(secret word: humbled)
Progress:  - u m b - e d
letters missed:  a i o n s r t
guess a letter:  l
congratulations, l is in the word!

# words possible:  3
(secret word: fumbled)
Progress:  - u m b l e d
letters missed:  a i o n s r t
guess a letter:  f
f is not in the secret word

You are hung!  The secret word was:  jumbled


A More Complicated Approach

Rather than just picking random words, we can select words more, well, "evil-ly". Let's illustrate this technique with an example. Suppose that you are playing Hangman and it's your turn to choose a word, which we'll assume is of length four. Rather than committing to a secret word, you instead compile a list of every four-letter word in the English language. For simplicity, let's assume that English only has a few four-letter words, all of which are provided in hangman-words.txt:

                        ally beta cool deal else flew good hope ibex

Now, suppose that your opponent guesses the letter 'e'. You now need to tell your opponent which letters in the word you've "picked" are e's. Of course, you haven't picked a word, and so you have multiple options about where you reveal the e's. Here's the above word list, with e's highlighted in each word:

                        ally beta cool deal else flew good hope ibex

If you'll notice, every word in your word list falls into one of five "word families":

----, which contains the word ally, cool, good
-e--, containing beta and deal
--e-, containing flew and ibex
e--e, containing else
---e, containing hope

Since the letters you reveal have to correspond to some word in your word list, you can choose to reveal any one of the above five families. There are many ways to pick which family to reveal. Perhaps you want to steer your opponent toward a smaller family with more obscure words, or toward a larger family in the hopes of keeping your options open. In the interest of simplicity, let's adopt the latter approach and always choose the largest of the remaining word families. In this case, it means that you should pick the family ----. This reduces your word list down to:

                                ally cool good

and since you didn't reveal any letters, you would tell your opponent that his guess was wrong. Let's see a few more examples of this strategy. Given this three-word word list, if your opponent guesses the letter O, then you would break your word list down into two families:

-oo-, containing cool, good
----, containing ally

The first of these families is larger than the second, and so you choose it, revealing two O's in the word and reducing your list down to:

                                cool good

But what happens if your opponent guesses a letter that doesn't appear anywhere in your word list? For example, what happens if your opponent now guesses 't'? This isn't a problem. If you try splitting these words apart into word families, you'll find that there's only one family, ---- in which 't' appears nowhere and which contains both cool and good. Since there is only one word family here, it's trivially the largest family, and by picking it you'd maintain the word list you already had.


Advice

It is up to you to think about how you want to partition words into word families. Think about what data structures would be best for tracking word families and the master word list. Would a map work? How about a stack or queue? Thinking through the design before you start coding will save you a lot of time and headache!

Since you're building this project from scratch, you'll need to do a bit of planning to figure out what the best data structures are for the program. There is no "right way" to go about writing this program, but some design decisions are much better than others (e.g. you can store your word list in a stack or map, but this is probably not the best option). Here are some general tips and tricks that might be useful:

  1. Letter position matters just as much as letter frequency. When computing word families, it's not enough to count the number of times a particular letter appears in a word; you also have to consider their positions. For example, "beer" and "here" are in two different families even though they both have two e's in them. Consequently, representing word families as numbers representing the frequency of the letter in the word will get you into trouble.
  2. Don't explicitly enumerate word families. If you are working with a word of length n, then there are 2n possible word families for each letter. However, most of these families don't actually appear in the English language. For example, no English words contain three consecutive u's, and no word matches the pattern e-ee-ee--e. Rather than explicitly generating every word family whenever the user enters a guess, see if you can generate word families only for words that actually appear in the word list. One way to do this would be to scan over the word list, storing each word in a table mapping word families to words in that family.

Extensions

The algorithm outlined above are by no means optimal, and there are several cases in which it will make bad decisions. For example, suppose that the human has exactly one guess remaining and that computer has the following word list:

                                deal tear monk

If the human guesses the letter 'e' here, the computer will notice that the word family -e-- has two elements and the word family ---- has just one. Consequently, it will pick the family containing deal and tear, revealing an 'e' and giving the player another chance to guess. However, since the player has only one guess left, a much better decision would be to pick the family ---- containing monk, causing the human to lose the game.

There are several other places in which the algorithm does not function ideally. For example, suppose that after the player guesses a letter, you find that there are two word families, the family --e- containing 10,000 words and the family ---- containing 9,000 words. Which family should the computer pick? If the computer picks the first family, it will end up with more words, but because it revealed a letter the user will have more chances to guess the words that are left. On the other hand, if the computer picks the family ----, the computer will have fewer words left but the human will have fewer guesses as well. More generally, picking the largest word family is not necessarily the best way to cause the human to lose. Often, picking a smaller family will be better.


Submitting Your Work

When you have completed the assignment and tested your code thoroughly, create a .zip file with your work (including Word.java and AnagramMap.java. Name the zip file "your-andrew-id".zip and email it to me mjs @ cs.cmu.edu.

Make sure to keep a copy of your work just in case!