15-381 Spring 200 Final Exam Answers
15-381 Spring 200 Final Exam Answers

Problem 1a: (5 points)

full cr. E(from i=0 to d/2)b^i + E(from i=0 to d/2)c^i
3 pts.   O(b^(d/2) + c^(d/2))
0 pts if d/2 is not the exponent

Problem 1b: (5 points)

Unifiable. Unifier = { y/A , x/h(A,B) }
3 points for unified expression f( h( A , B ) , g( A , A , h( A,B ) ) )
0 points if "not unifiable"

Problem 1c: (5 points)

The correct answer here was O(wd + q).  The wd term comes from multiplying
the query matrix which is size w * d by the query vector.  The term q comes
from calculating the size of the query vector.

Problem 1d: (5 points)

argmax P(wordi | signal) = argmax P(signal | wordi) * P(wordi) / P(signal)

But since P(signal) is constant over all words, we just need to calculate:

argmax P(signal | wordi) * P(wordi)

Problem 2: NLP (15 points)

a) Any task description where a method more complex than bag-of-words
but less than full parsing will do, so long as it is clear that said
method applies and is beneficial, got full credit, for instance:
- Information extraction (detecting propoer names, places,
dates, monterary amounts, etc.).  This requires finite state
automata (or simpler), essentially k-word pattern matching,
where k = 2 to 6 (less than length of sentence).
- Summarization (producing a short version of the text), where
sentence segmentation (punctuation, capitalization, etc.) is
required on top of b-of-w to find relevant senteces.  Here
Info extraction helps too (as subroutine) to select sentences
mentioning interesting names, places, but it is not required
for base-level summarization
- Domain specific command interpretation (e.g. "move  to ")
which can be done by template matching, but does require word
order constraints to be respected.

and anything at either extreme, as stated in the problem
description itself.

b) Any response that requires the construction of all 10^15 trigrams is wrong,
regardless of any clever data structure (such as prefix trees) to hold it,
because they still need 10^15 distinct leaves.

Partial credit for simply noting that only a small fragment of the possible
trigrams will occur in English, so simply adding a trigram when it is first
seen results in less than 10^15, as the majority will never be seen.

Full credit for any correct incremental scheme that only builds meaningful
triagrams in the first place.  The best answer is:
Step 1: look at all the bi-grams that occur in any large training
corpus (this is significantly less than 10^10).  Then, keep
only those that both occur more than once and whose probability
of occurence is far above chance, i.e.
p(w_1,w_2) >> p(w_1)p(w_2)
Tune the >> theshold so that the top 10^5 to 10^6 are kept.
Step 2: repeat above using bigram-unigram seqences from the
bigrams kept in step 1 to construct trigrams, and keep
those that occur more than once and:
p(w_1,w_2,w_3) >> p(w_1,w_2)p(w_3)
Again tune the >> theshold so that the top 10^5 to 10^6
are kept.  In total, then the dictionary + bigrams +
trigrams will be O(10^6) rather than O(10^15).  If desired
for an application where memory is ample, you can go to
O(10^7) or any size required, but with diminishing returns
as lower-frequency or less-meaninfgul n-grams are kept.

Problem 3: Probability (15 points)

a) The answer we were looking for was "conditional independence", from
which p(LATE) falls out immediately.  It turns out that there were many
correct answers.  However, some incorrect tries included:
1) Asking for two pieces of information (got partial credit if one of
the two was correct)
2) Asking for the value of one of the variables (doesn't help us get
p(LATE))

b) This was kind of tricky.  Given conditional independence, knowing
LATE does not give us any additional information about the value of DAY
(i.e. p(DAY=Tues) = p(DAY=Tues).
However, most people did not select conditional independence as their
given to part (a). If you didn't answer conditional independence, then
p(DAY) = p(DAY) is -not- correct, as knowing LATE then does give information

Problem 4a: (5 points)

If the data is linearly separable you could use linear regression, a single
perceptron, or a neural network with no hidden units.  (There are also many
other possibilities.)

If the data is not linearly separable you could use a decision tree or a
neural network with hidden units.  (There are also many other possibilities.)

Problem 4b: (5 points)

You would want to randomly split up your data into a training set and a
test set.  Typically 30% of your data would be in your test set.

You would then train your model using the training set data only.  The
testing set data would only be used to test the model and  predict the
future performance of your model.  If you were trying to choose between
multiple models, you could use the performance on the test set to decide
which model is likely to perform best.

Problem 4c: (5 points)

Entropy(no split)  = -[300/900 log(300/900) + 600/900 log(600/900)] =~ 0.918

Entropy(f present) = -[250/300 log(250/300) +  50/300 log( 50/300)] =~ 0.650
Entropy(f absent)  = -[ 50/600 log( 50/600) + 550/600 log(550/600)] =~ 0.414

IG(f) = E(no split) - [300/900 E(f present) + 600/900 E(f absent)]
=~ 0.426

Entropy(g present) = 0
Entropy(g absent)  = -[260/860 log(260/860) + 600/860 log(600/860)] =~ 0.884

IG(g) = E(no split) - [ 40/900 E(g present) + 860/900 E(g absent)]
=~ 0.073

So f is the better choice since it has higher information gain.

(Note:  All log's are base 2.)

Problem 4d: (5 points)

d) The simplest (and also very robust) way is to let ? be the most
common value for feature f.  We cannot just ignore every piece of data
with a ?, because it might be that every sample has some uncertainty
associated with it, and ignoring would leave us with no samples.

Partial credit was given for adding a ? branch to the tree at each
split, because the possible lack of data samples with a ? for a given
feature means that the resulting decision tree might predict very poorly
on data with ?'s, when it would have a better chance going with a
majority approach.

Problem 5, Neural Networks (5 points)

1. Legal
2. Legal
3. Illegal
4. Legal
5. Accept both
6. Accept both
7. Accept both

Problem 6: MDPs (15pts)

1. State space = Floor that each elevator is in,
Time since each button was pushed,
State of button (ie. going up, going down).
Actions = Go up, go down, stay still
Reward = -1 * sum of times since each button was pushed

Note this setup minimizes the waiting time for passengers getting on
to the elevator but does not do anything once passengers have gotten on.
It also assumes that the buttons being pushed correspond to passengers
on a certain floor and that all passengers on a floor that are going in
one direction all get on the elevator at that time.

2. Yes, it is Markovian since the reward and probabilities depend only on
the current state.

3. Q-learning is a good algorithm except that the training time might take a
while.

Problem 7: RL (15 points)

As postscript
Or as tex:

Problem 7a:
Some people got philosophical on this one, asking what I meant by
"eventually." Formally, my intent was "with probability 1 as time goes
to infinity," in which case the answer was yes, the robot will
eventually make it to the goal state. Regardless of the policy
followed, the randomness in its execution (0.05 probability of moving
in one of the three directions other than intended) ensures that, with
probability 1, it will eventually take enough moves in the right
direction to get out of any dead ends.

In most cases, I gave partial credit to reasonable, but differing
interpretations of "always" and "eventually".

Problem 7b:
Begin with open loop policy of "always go right". Given the noise
model, this means that the robot will go right 85\% of the time, up
5\% of the time, down 5\% of the time, and left 5\% of the time. If
there's a wall in the direction it attempts to go, it will find itself
back in the same state.

Initially, assume "cost to go" (which we'll write as $v()$) from each
state (except G) is 1.
\\

Conditions before first backup of S:
\begin{eqnarray*}
C_S & = &\mbox{immediate cost of step} = 1 \\
p_{S4} & = &\mbox{prob of going right}=0.85 \\
p_{S1} & = &\mbox{prob of going up}=0.05 \\
p_{SS} & = &\mbox{prob of staying put (by trying to go left)}=0.05 \\
v(S)&=&1,v(1)=1,v(4)=1,v(8)=1
\end{eqnarray*}
First backup of S:
\begin{eqnarray*}
v(S) & = & C_s+\sum_{S'}p_{SS'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 1 = 2
\end{eqnarray*}

Conditions before first backup of 4:
\begin{eqnarray*}
C_4 & = &\mbox{immediate cost of step} = 1 \\
p_{45} & = &\mbox{prob of going right}=0.85 \\
p_{44} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\
p_{4S} & = &\mbox{prob of going left}=0.05 \\
v()&=&2,v(4)=1,v(5)=1
\end{eqnarray*}
First backup of 4:
\begin{eqnarray*}
v(4) & = & C_4+\sum_{S'}p_{4S'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 1 + 0.1 \cdot 1 + 0.05 \cdot 2 = 2.05
\end{eqnarray*}

Conditions before first backup of 5:
\begin{eqnarray*}
C_5 & = &\mbox{immediate cost of step} = 1 \\
p_{56} & = &\mbox{prob of going right}=0.85 \\
p_{52} & = &\mbox{prob of going up}=0.05 \\
p_{59} & = &\mbox{prob of going down}=0.05 \\
p_{54} & = &\mbox{prob of going left}=0.05 \\
v(2)&=&1,v(4)=2.05,v(5)=1,v(9)=1
\end{eqnarray*}
First backup of 5:
\begin{eqnarray*}
v(5) & = & C_5+\sum_{S'}p_{5S'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05 = 2.0525
\end{eqnarray*}

Conditions before first backup of 6:
\begin{eqnarray*}
C_6 & = &\mbox{immediate cost of step} = 1 \\
p_{67} & = &\mbox{prob of going right}=0.85 \\
p_{66} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\
p_{65} & = &\mbox{prob of going left}=0.05 \\
v(5)&=&2.0525,v(6)=1,v(7)=1
\end{eqnarray*}
First backup of 6:
\begin{eqnarray*}
v(6) & = & C_6+\sum_{S'}p_{6S'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 1 + 0.1 \cdot 1 + 0.05 \cdot 2.0525 = 2.05265
\end{eqnarray*}

Conditions before first backup of 7:
\begin{eqnarray*}
C_7 & = &\mbox{immediate cost of step} = 1 \\
p_{7G} & = &\mbox{prob of going right}=0.85 \\
p_{73} & = &\mbox{prob of going up}=0.05 \\
p_{710} & = &\mbox{prob of going down}=0.05 \\
p_{76} & = &\mbox{prob of going left}=0.05 \\
v(3)&=&1,v(6)=2.05265,v(10)=1,v(G)=0
\end{eqnarray*}
First backup of 7:
\begin{eqnarray*}
v(7) & = & C_7+\sum_{S'}p_{7S'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 0 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05265 = 1.2026
\end{eqnarray*}

Conditions before second backup of 6:
\begin{eqnarray*}
C_6 & = &\mbox{immediate cost of step} = 1 \\
p_{67} & = &\mbox{prob of going right}=0.85 \\
p_{66} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\
p_{65} & = &\mbox{prob of going left}=0.05 \\
v(5)&=&2.0525,v(6)=2.05265,v(7)=1.2026
\end{eqnarray*}
Second backup of 6:
\begin{eqnarray*}
v(6) & = & C_6+\sum_{S'}p_{6S'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 1.2026 + 0.1 \cdot 1 + 0.05 \cdot 2.05265 = 2.329
\end{eqnarray*}

Conditions before second backup of 5:
\begin{eqnarray*}
C_5 & = &\mbox{immediate cost of step} = 1 \\
p_{56} & = &\mbox{prob of going right}=0.85 \\
p_{52} & = &\mbox{prob of going up}=0.05 \\
p_{59} & = &\mbox{prob of going down}=0.05 \\
p_{54} & = &\mbox{prob of going left}=0.05 \\
v(2)&=&1,v(4)=2.05,v(6) = 2.329,v(9)=1
\end{eqnarray*}
Second backup of 5:
\begin{eqnarray*}
v(5) & = & C_5+\sum_{S'}p_{5S'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 2.329 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05 = 3.18215
\end{eqnarray*}

Conditions before second backup of 4:
\begin{eqnarray*}
C_4 & = &\mbox{immediate cost of step} = 1 \\
p_{45} & = &\mbox{prob of going right}=0.85 \\
p_{44} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\
p_{4S} & = &\mbox{prob of going left}=0.05 \\
v(S)&=&2,v(4)=2.05,v(5)=3.18215
\end{eqnarray*}
Second backup of 4:
\begin{eqnarray*}
v(4) & = & C_4+\sum_{S'}p_{4S'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 3.18215 + 0.1 \cdot 1 + 0.05 \cdot 2 = 4.00
\end{eqnarray*}

Conditions before second backup of S:
\begin{eqnarray*}
C_S & = &\mbox{immediate cost of step} = 1 \\
p_{S4} & = &\mbox{prob of going right}=0.85 \\
p_{S1} & = &\mbox{prob of going up}=0.05 \\
p_{SS} & = &\mbox{prob of staying put (by trying to go left)}=0.05 \\
v(S)&=&2,v(1)=1,v(4)=4.00,v(8)=1
\end{eqnarray*}
Second backup of S:
\begin{eqnarray*}
v(S) & = & C_s+\sum_{S'}p_{SS'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 4.00 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2 = 4.6
\end{eqnarray*}

Conditions before second backup of 7:
\begin{eqnarray*}
C_7 & = &\mbox{immediate cost of step} = 1 \\
p_{7G} & = &\mbox{prob of going right}=0.85 \\
p_{73} & = &\mbox{prob of going up}=0.05 \\
p_{710} & = &\mbox{prob of going down}=0.05 \\
p_{76} & = &\mbox{prob of going left}=0.05 \\
v(3)&=&1,v(6)=2.329,v(10)=1,v(G)=0
\end{eqnarray*}
Second backup of 7:
\begin{eqnarray*}
v(7) & = & C_7+\sum_{S'}p_{7S'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 0 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.329 = 1.216
\end{eqnarray*}

Conditions before first backup of 3:
\begin{eqnarray*}
C_3 & = &\mbox{immediate cost of step} = 1 \\
p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.95 \\
p_{37} & = &\mbox{prob of going down}=0.05 \\
v(3)&=&1,v(7)=1.216
\end{eqnarray*}
First backup of 3:
\begin{eqnarray*}
v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\
& = & 1 + 0.95 \cdot 1 + 0.05 \cdot 1.216 = 1.9108
\end{eqnarray*}

Conditions before second backup of 3:
\begin{eqnarray*}
C_3 & = &\mbox{immediate cost of step} = 1 \\
p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.95 \\
p_{37} & = &\mbox{prob of going down}=0.05 \\
v(3)&=&1.9108,v(7)=1.216
\end{eqnarray*}
Second backup of 3:
\begin{eqnarray*}
v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\
& = & 1 + 0.95 \cdot 1.9108 + 0.05 \cdot 1.216 = 2.68
\end{eqnarray*}

Problem 7c:

Initially, all neighboring states look equally good to the robot, so
the max of the expected next value doesn't depend on which next
(valid) state you try to move to. The initial backups S-4-5-6-7-6 are
the same as before.

However, when we next try to backup state 5, unexplored states 2 and 9
still have (incorrect) value estimates of 1, so our policy of going to
the lowest neighbor leads us astray. Let's arbitrarily say we decide
to go up (rather than down)

Conditions before second backup of 5:
\begin{eqnarray*}
C_5 & = &\mbox{immediate cost of step} = 1 \\
p_{56} & = &\mbox{prob of going right}=0.05 \\
p_{52} & = &\mbox{prob of going up}=0.85 \\
p_{59} & = &\mbox{prob of going down}=0.05 \\
p_{54} & = &\mbox{prob of going left}=0.05 \\
v(2)&=&1,v(4)=2.05,v(6) = 2.329,v(9)=1
\end{eqnarray*}
Second backup of 5:
\begin{eqnarray*}
v(5) & = & C_5+\sum_{S'}p_{5S'}\cdot v(S') \\
& = & 1 + 0.05 \cdot 2.329 + 0.85 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05 = 2.122
\end{eqnarray*}
We won't correct the erroneous estimates of states 2 and 9 until we actually back {\em them} up.

Now, due to our underestimates of state S, our policy again points us in the wrong direction.
Conditions before second backup of 4:
\begin{eqnarray*}
C_4 & = &\mbox{immediate cost of step} = 1 \\
p_{45} & = &\mbox{prob of going right}=0.05 \\
p_{44} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\
p_{4S} & = &\mbox{prob of going left}=0.85 \\
v(S)&=&2,v(4)=2.05,v(5)=2.122
\end{eqnarray*}
Second backup of 4:
\begin{eqnarray*}
v(4) & = & C_4+\sum_{S'}p_{4S'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 2 + 0.1 \cdot 2.05 + 0.05 \cdot 2.122 = 3.011
\end{eqnarray*}

Conditions before second backup of S:
\begin{eqnarray*}
C_S & = &\mbox{immediate cost of step} = 1 \\
p_{S4} & = &\mbox{prob of going right}=0.05 \\
p_{S1} & = &\mbox{prob of going up}=0.85 \\
p_{SS} & = &\mbox{prob of staying put (by trying to go left)}=0.05 \\
v(S)&=&2,v(1)=1,v(4)=3.011,v(8)=1
\end{eqnarray*}
Second backup of S:
\begin{eqnarray*}
v(S) & = & C_s+\sum_{S'}p_{SS'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 1 + 0.05 \cdot 2 + 0.05 \cdot 1 + 0.05 \cdot 3.011 = 2.15
\end{eqnarray*}

Conditions before second backup of 7:
\begin{eqnarray*}
C_7 & = &\mbox{immediate cost of step} = 1 \\
p_{7G} & = &\mbox{prob of going right}=0.85 \\
p_{73} & = &\mbox{prob of going up}=0.05 \\
p_{710} & = &\mbox{prob of going down}=0.05 \\
p_{76} & = &\mbox{prob of going left}=0.05 \\
v(3)&=&1,v(6)=2.329,v(10)=1,v(G)=0
\end{eqnarray*}
Second backup of 7:
\begin{eqnarray*}
v(7) & = & C_7+\sum_{S'}p_{7S'}\cdot v(S') \\
& = & 1 + 0.85 \cdot 0 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.329 = 1.216
\end{eqnarray*}

Conditions before first backup of 3:
\begin{eqnarray*}
C_3 & = &\mbox{immediate cost of step} = 1 \\
p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.15 \\
p_{37} & = &\mbox{prob of going down}=0.85 \\
v(3)&=&1,v(7)=1.216
\end{eqnarray*}
First backup of 3:
\begin{eqnarray*}
v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\
& = & 1 + 0.15 \cdot 1 + 0.85 \cdot 1.216 = 2.183
\end{eqnarray*}

Conditions before second backup of 3:
\begin{eqnarray*}
C_3 & = &\mbox{immediate cost of step} = 1 \\
p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.15 \\
p_{37} & = &\mbox{prob of going down}=0.85 \\
v(3)&=&2.183,v(7)=1.216
\end{eqnarray*}
Second backup of 3:
\begin{eqnarray*}
v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\
& = & 1 + 0.15 \cdot 2.183 + 0.85 \cdot 1.216 = 2.36
\end{eqnarray*}

The take home message on this is that you need to back up everything
(like states 2 and 9), or your values may never point you in the
direction you need, and, for a fixed policy, the value function
converges more quickly than if one's policy is dictated by the current
value estimates (though both are guaranteed to converge if all states
are backed up infinitely often.
\end{document}