15-121 Homework 8: Duplicitous Hangman - Due 12/12 ;-) at midnight
Props to Keith Schwartz, who originally developed this assignment and
presented it at SIGCSE 2011 as a Nifty Assignment.
Download and unzip hw8-code.zip, which contains the
dictionary files you will need for this assignment. You are permitted to work
with a partner and are also permitted to propose a different final project
that explores the data structures that we have discussed this semester.
The goals of this assignment are:
- to wrap up our discussion of Java data structures
- to gain more experience searching and using the Java API
- to gain more experience in creating a solution to a problem with minimal
guidance as to what methods to write and how to write them
Note: You will be graded in part on your coding style. Your code
should be easy to read, well organized, and concise. You should avoid
duplicate code.
Background: The Assignment
In this assignment, you will build a clever program that bends the rules of
Hangman to trounce its human opponent time and time again. In doing so, you'll
gain more experience with data structures, program design and iterative
development, and will deepen your general programming savvy. Plus, you'll end
up with a piece of software which should be highly entertaining. At least,
from your perspective.
In case you aren't familiar with the game Hangman, the rules are as follows:
- One player chooses a secret word, then writes out a number of dashes
equal to the word length.
- The other player begins guessing letters. Whenever the guessed letter is
contained in the word, the first player reveals each instance of that
letter in its correct position in the word. Otherwise, the guess is wrong
and the player loses a guess.
- The game ends either when all the letters in the word have been revealed
(i.e., the guesserhas determined the word) or when the guesser has run out
of guesses.
Fundamental to the game is the fact the first player accurately represents the
chosen word. That way, when the other player guesses a letter, it is
correctly revealed in the word or correctly reported that it is not contained
in the word. But what happens if the player doesn't do this? This gives the
player who chooses the hidden word an enormous advantage. For example, suppose
that you're the player trying to guess the word, and at some point you
end up revealing letters until you arrive at this point with only one guess
remaining:
D O - B L E
There are only two words in the English language that match this pattern:
"doable" and "double". If the player who chose the hidden word is playing
fairly, then you have a fifty-fifty chance of winning this game if you guess
'A' or 'U' as the missing letter. However, if your opponent is cheating and
hasn't actually committed to either word, then there is no possible way you
can win this game. No matter what letter you guess, your opponent can claim
that she had picked the other word and you will lose the game.
The Basic Algorithm
Here's a description of the algorithm involved in the sample run below. The
user chooses a word length (make sure it is greater than 0 and less than or
equal to the length of the largest word in the dictionary). The computer then
picks a random word of the correct length. The game proceeds as follows:
- the user guesses a letter
- if the letter is not in the word, inform the user, they lose a guess
and, if out of guesses, the game ends. If not out of guesses, the user
guesses another letter and we try again.
- if the letter is in the word, see if there is another word (or words)
that fits the current pattern without the letter that was just guessed.
If there is, decide (somehow) whether to congratulate the user on a
successful guess and reveal the letter or (randomly) pick a new word without
the letter, inform the user that the letter was not in the word (even though
it might have been in the "original" word) and charge them an unsuccessful
guess. As above, if they are out of guesses, the game ends. If not, the
user guesses another letter and we go around again.
A Sample Run
When the game starts, the computer pick a word at random as the secret word,
in this case the word curries has been chosen. The player has no
misses and no letters of the secret word are shown. Since debug-printing is
on, the computer displays the secret word and the number of possibilities for
the secret word – in this case all 23,096 seven-letter words in the
computer's dictionary are possible. In the output below the italicized lines
would normally not be printed as part of the game.
Welcome to (Clever) Hangman:
Enter maximum word length: 7
# words possible: 23096
(secret word: curries)
Progress: - - - - - - -
letters missed:
guess a letter: i
i is not in the secret word
# words possible: 4048
(secret word: tresses)
Progress: - - - - - - -
letters missed: i
guess a letter: e
congratulations, e is in the word!
# words possible: 969
(secret word: waffles)
Progress: - - - - - e -
letters missed: i
guess a letter: a
a is not in the secret word
# words possible: 455
(secret word: toppled)
Progress: - - - - - e -
letters missed: i a
guess a letter: o
o is not in the secret word
# words possible: 159
(secret word: jumbled)
Progress: - - - - - e -
letters missed: i a o
guess a letter: u
congratulations, u is in the word!
# words possible: 76
(secret word: rumbled)
Progress: - u - - - e -
letters missed: i a o
guess a letter: r
r is not in the secret word
# words possible: 46
(secret word: tumbled)
Progress: - u - - - e -
letters missed: i a r o
guess a letter: t
t is not in the secret word
# words possible: 41
(secret word: bundles)
Progress: - u - - - e -
letters missed: i a r t o
guess a letter: s
s is not in the secret word
# words possible: 20
(secret word: buckled)
Progress: - u - - - e -
letters missed: a i o s r t
guess a letter: d
congratulations, d is in the word!
# words possible: 15
(secret word: lunched)
Progress: - u - - - e d
letters missed: a i o s r t
guess a letter: n
n is not in the secret word
# words possible: 9
(secret word: bumbled)
Progress: - u - - - e d
letters missed: a i o n s r t
guess a letter: m
congratulations, m is in the word!
# words possible: 4
(secret word: jumbled)
Progress: - u m - - e d
letters missed: a i o n s r t
guess a letter: b
congratulations, b is in the word!
# words possible: 3
(secret word: humbled)
Progress: - u m b - e d
letters missed: a i o n s r t
guess a letter: l
congratulations, l is in the word!
# words possible: 3
(secret word: fumbled)
Progress: - u m b l e d
letters missed: a i o n s r t
guess a letter: f
f is not in the secret word
You are hung! The secret word was: jumbled
A More Complicated Approach
Rather than just picking random words, we can select words more, well,
"evil-ly". Let's illustrate this technique with an example. Suppose that you
are playing Hangman and it's your turn to choose a word, which we'll assume is
of length four. Rather than committing to a secret word, you instead compile a
list of every four-letter word in the English language. For simplicity, let's
assume that English only has a few four-letter words, all of which are provided in hangman-words.txt:
ally beta cool deal else flew good hope ibex
Now, suppose that your opponent guesses the letter 'e'. You now need to tell
your opponent which letters in the word you've "picked" are e's. Of course,
you haven't picked a word, and so you have multiple options about where you
reveal the e's. Here's the above word list, with e's highlighted in each word:
ally beta cool deal else flew good hope ibex
If you'll notice, every word in your word list falls into one of five "word
families":
----, which contains the word ally, cool, good
-e--, containing beta and deal
--e-, containing flew and ibex
e--e, containing else
---e, containing hope
Since the letters you reveal have to correspond to some word in your word
list, you can choose to reveal any one of the above five families. There are
many ways to pick which family to reveal. Perhaps you want to steer your
opponent toward a smaller family with more obscure words, or toward a larger
family in the hopes of keeping your options open. In the interest of
simplicity, let's adopt the latter approach and always choose the largest of
the remaining word families. In this case, it means that you should pick the
family ----. This reduces your word list down to:
ally cool good
and since you didn't reveal any letters, you would tell your opponent that his
guess was wrong. Let's see a few more examples of this strategy. Given this
three-word word list, if your opponent guesses the letter O, then you would
break your word list down into two families:
-oo-, containing cool, good
----, containing ally
The first of these families is larger than the second, and so you choose it,
revealing two O's in the word and reducing your list down to:
cool good
But what happens if your opponent guesses a letter that doesn't appear
anywhere in your word list? For example, what happens if your opponent now
guesses 't'? This isn't a problem. If you try splitting these words apart into
word families, you'll find that there's only one family, ---- in
which 't' appears nowhere and which contains both cool and good. Since there is
only one word family here, it's trivially the largest family, and by picking
it you'd maintain the word list you already had.
Advice
It is up to you to think about how you want to partition words into word
families. Think about what data structures would be best for tracking word
families and the master word list. Would a map work? How about a stack or
queue? Thinking through the design before you start coding will save you a lot
of time and headache!
Since you're building this project from scratch, you'll need to do a bit of
planning to figure out what the best data structures are for the program.
There is no "right way" to go about writing this program, but some design
decisions are much better than others (e.g. you can store your word list in a
stack or map, but this is probably not the best option). Here are some general
tips and tricks that might be useful:
- Letter position matters just as much as letter frequency. When computing
word families, it's not enough to count the number of times a particular
letter appears in a word; you also have to consider their positions. For
example, "beer" and "here" are in two different families even though they both
have two e's in them. Consequently, representing word families as numbers
representing the frequency of the letter in the word will get you into
trouble.
- Don't explicitly enumerate word families. If you are working with a word
of length n, then there are 2n possible word families for each
letter. However, most of these families don't actually appear in the English
language. For example, no English words contain three consecutive u's, and no
word matches the pattern e-ee-ee--e. Rather than explicitly
generating every word family whenever the user enters a guess, see if you can
generate word families only for words that actually appear in the word
list. One way to do this would be to scan over the word list, storing each
word in a table mapping word families to words in that family.
Extensions
The algorithm outlined above are by no means optimal, and there are
several cases in which it will
make bad decisions. For example, suppose that the human has exactly one guess
remaining and that
computer has the following word list:
deal tear monk
If the human guesses the letter 'e' here, the computer will notice that the
word family -e-- has two elements and the word family ----
has just one. Consequently, it will pick the family containing deal and tear,
revealing an 'e' and giving the player another chance to guess. However, since
the player has only one guess left, a much better decision would be to pick
the family ---- containing monk, causing the human to lose the
game.
There are several other places in which the algorithm does not function
ideally. For example, suppose that after the player guesses a letter, you find
that there are two word families, the family --e- containing
10,000 words and the family ---- containing 9,000 words. Which family
should the computer pick? If the computer picks the first family, it will end
up with more words, but because it revealed a letter the user will have more
chances to guess the words that are left. On the other hand, if the computer
picks the family ----, the computer will have fewer words left but
the human will have fewer guesses as well. More generally, picking the largest
word family is not necessarily the best way to cause the human to lose. Often,
picking a smaller family will be better.
Submitting Your Work
When you have completed the assignment and tested your code thoroughly,
create a .zip file with your work (including Word.java and
AnagramMap.java. Name the zip file "your-andrew-id".zip and email it
to me mjs @ cs.cmu.edu.
Make sure to keep a copy of your work just in case!