15-381 Spring 200 Final Exam Answers
15-381 Spring 200 Final Exam Answers


Problem 1a: (5 points)

   full cr. E(from i=0 to d/2)b^i + E(from i=0 to d/2)c^i
   3 pts.   O(b^(d/2) + c^(d/2))
   0 pts if d/2 is not the exponent


Problem 1b: (5 points)

   Unifiable. Unifier = { y/A , x/h(A,B) }
   3 points for unified expression f( h( A , B ) , g( A , A , h( A,B ) ) )
   0 points if "not unifiable"


Problem 1c: (5 points)

   The correct answer here was O(wd + q).  The wd term comes from multiplying
   the query matrix which is size w * d by the query vector.  The term q comes
   from calculating the size of the query vector.


Problem 1d: (5 points)

   The correct answer here is:

   argmax P(wordi | signal) = argmax P(signal | wordi) * P(wordi) / P(signal)

   But since P(signal) is constant over all words, we just need to calculate:

   argmax P(signal | wordi) * P(wordi) 


Problem 2: NLP (15 points)

a) Any task description where a method more complex than bag-of-words
   but less than full parsing will do, so long as it is clear that said
   method applies and is beneficial, got full credit, for instance:
	  - Information extraction (detecting propoer names, places,
	    dates, monterary amounts, etc.).  This requires finite state
	    automata (or simpler), essentially k-word pattern matching,
	    where k = 2 to 6 (less than length of sentence).
	  - Summarization (producing a short version of the text), where
	    sentence segmentation (punctuation, capitalization, etc.) is
	    required on top of b-of-w to find relevant senteces.  Here
	    Info extraction helps too (as subroutine) to select sentences
	    mentioning interesting names, places, but it is not required
	    for base-level summarization
	  - Domain specific command interpretation (e.g. "move  to ")
	    which can be done by template matching, but does require word
	    order constraints to be respected.

       Wrong answers included IR, web-based IR, translation, question-answering,
	     and anything at either extreme, as stated in the problem
	     description itself.

b) Any response that requires the construction of all 10^15 trigrams is wrong,
   regardless of any clever data structure (such as prefix trees) to hold it,
   because they still need 10^15 distinct leaves.

   Partial credit for simply noting that only a small fragment of the possible
   trigrams will occur in English, so simply adding a trigram when it is first
   seen results in less than 10^15, as the majority will never be seen.

   Full credit for any correct incremental scheme that only builds meaningful
   triagrams in the first place.  The best answer is:
       Step 1: look at all the bi-grams that occur in any large training
	       corpus (this is significantly less than 10^10).  Then, keep
	       only those that both occur more than once and whose probability
	       of occurence is far above chance, i.e. 
		       p(w_1,w_2) >> p(w_1)p(w_2)
	       Tune the >> theshold so that the top 10^5 to 10^6 are kept.  
       Step 2: repeat above using bigram-unigram seqences from the
	       bigrams kept in step 1 to construct trigrams, and keep 
	       those that occur more than once and:
		       p(w_1,w_2,w_3) >> p(w_1,w_2)p(w_3)
	       Again tune the >> theshold so that the top 10^5 to 10^6 
	       are kept.  In total, then the dictionary + bigrams +
	       trigrams will be O(10^6) rather than O(10^15).  If desired
	       for an application where memory is ample, you can go to
	       O(10^7) or any size required, but with diminishing returns
	       as lower-frequency or less-meaninfgul n-grams are kept.



Problem 3: Probability (15 points)

   a) The answer we were looking for was "conditional independence", from
   which p(LATE) falls out immediately.  It turns out that there were many
   correct answers.  However, some incorrect tries included:
      1) Asking for two pieces of information (got partial credit if one of
	   the two was correct)
      2) Asking for the value of one of the variables (doesn't help us get
	   p(LATE))

   b) This was kind of tricky.  Given conditional independence, knowing
   LATE does not give us any additional information about the value of DAY
   (i.e. p(DAY=Tues) = p(DAY=Tues).
      However, most people did not select conditional independence as their
   piece of information, so answers varied depending on what answer was
   given to part (a). If you didn't answer conditional independence, then
   p(DAY) = p(DAY) is -not- correct, as knowing LATE then does give information
   about the value of DAY.
     


Problem 4a: (5 points)

   If the data is linearly separable you could use linear regression, a single 
   perceptron, or a neural network with no hidden units.  (There are also many 
   other possibilities.)

   If the data is not linearly separable you could use a decision tree or a 
   neural network with hidden units.  (There are also many other possibilities.)


Problem 4b: (5 points)

   You would want to randomly split up your data into a training set and a
   test set.  Typically 30% of your data would be in your test set.

   You would then train your model using the training set data only.  The
   testing set data would only be used to test the model and  predict the
   future performance of your model.  If you were trying to choose between
   multiple models, you could use the performance on the test set to decide
   which model is likely to perform best.


Problem 4c: (5 points)

   Entropy(no split)  = -[300/900 log(300/900) + 600/900 log(600/900)] =~ 0.918

   Entropy(f present) = -[250/300 log(250/300) +  50/300 log( 50/300)] =~ 0.650
   Entropy(f absent)  = -[ 50/600 log( 50/600) + 550/600 log(550/600)] =~ 0.414

   IG(f) = E(no split) - [300/900 E(f present) + 600/900 E(f absent)]
         =~ 0.426

   Entropy(g present) = 0
   Entropy(g absent)  = -[260/860 log(260/860) + 600/860 log(600/860)] =~ 0.884

   IG(g) = E(no split) - [ 40/900 E(g present) + 860/900 E(g absent)]
         =~ 0.073

   So f is the better choice since it has higher information gain.

   (Note:  All log's are base 2.)


Problem 4d: (5 points)

   d) The simplest (and also very robust) way is to let ? be the most
   common value for feature f.  We cannot just ignore every piece of data
   with a ?, because it might be that every sample has some uncertainty
   associated with it, and ignoring would leave us with no samples.
     
      Partial credit was given for adding a ? branch to the tree at each
   split, because the possible lack of data samples with a ? for a given
   feature means that the resulting decision tree might predict very poorly
   on data with ?'s, when it would have a better chance going with a
   majority approach.


Problem 5, Neural Networks (5 points)
     
   1. Legal
   2. Legal
   3. Illegal
   4. Legal
   5. Accept both
   6. Accept both
   7. Accept both



Problem 6: MDPs (15pts)

   1. State space = Floor that each elevator is in,
		    Time since each button was pushed,
		    State of button (ie. going up, going down).
      Actions = Go up, go down, stay still
      Reward = -1 * sum of times since each button was pushed

      Note this setup minimizes the waiting time for passengers getting on
      to the elevator but does not do anything once passengers have gotten on.
      It also assumes that the buttons being pushed correspond to passengers
      on a certain floor and that all passengers on a floor that are going in
      one direction all get on the elevator at that time.

   2. Yes, it is Markovian since the reward and probabilities depend only on
      the current state. 

   3. Q-learning is a good algorithm except that the training time might take a
      while.



Problem 7: RL (15 points)
As postscript
Or as tex:

Problem 7a:
   Some people got philosophical on this one, asking what I meant by
   "eventually." Formally, my intent was "with probability 1 as time goes
   to infinity," in which case the answer was yes, the robot will
   eventually make it to the goal state. Regardless of the policy
   followed, the randomness in its execution (0.05 probability of moving
   in one of the three directions other than intended) ensures that, with
   probability 1, it will eventually take enough moves in the right
   direction to get out of any dead ends.

   In most cases, I gave partial credit to reasonable, but differing
   interpretations of "always" and "eventually".


Problem 7b:
   Begin with open loop policy of "always go right". Given the noise
   model, this means that the robot will go right 85\% of the time, up
   5\% of the time, down 5\% of the time, and left 5\% of the time. If
   there's a wall in the direction it attempts to go, it will find itself
   back in the same state.

   Initially, assume "cost to go" (which we'll write as $v()$) from each
   state (except G) is 1.
   \\

   Conditions before first backup of S:
   \begin{eqnarray*}
   C_S & = &\mbox{immediate cost of step} = 1 \\
   p_{S4} & = &\mbox{prob of going right}=0.85 \\
   p_{S1} & = &\mbox{prob of going up}=0.05 \\
   p_{SS} & = &\mbox{prob of staying put (by trying to go left)}=0.05 \\
   v(S)&=&1,v(1)=1,v(4)=1,v(8)=1
   \end{eqnarray*}
   First backup of S:
   \begin{eqnarray*}
   v(S) & = & C_s+\sum_{S'}p_{SS'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 1 = 2
   \end{eqnarray*}

   Conditions before first backup of 4:
   \begin{eqnarray*}
   C_4 & = &\mbox{immediate cost of step} = 1 \\
   p_{45} & = &\mbox{prob of going right}=0.85 \\
   p_{44} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\
   p_{4S} & = &\mbox{prob of going left}=0.05 \\
   v()&=&2,v(4)=1,v(5)=1
   \end{eqnarray*}
   First backup of 4:
   \begin{eqnarray*}
   v(4) & = & C_4+\sum_{S'}p_{4S'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 1 + 0.1 \cdot 1 + 0.05 \cdot 2 = 2.05
   \end{eqnarray*}

   Conditions before first backup of 5:
   \begin{eqnarray*}
   C_5 & = &\mbox{immediate cost of step} = 1 \\
   p_{56} & = &\mbox{prob of going right}=0.85 \\
   p_{52} & = &\mbox{prob of going up}=0.05 \\
   p_{59} & = &\mbox{prob of going down}=0.05 \\
   p_{54} & = &\mbox{prob of going left}=0.05 \\
   v(2)&=&1,v(4)=2.05,v(5)=1,v(9)=1
   \end{eqnarray*}
   First backup of 5:
   \begin{eqnarray*}
   v(5) & = & C_5+\sum_{S'}p_{5S'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05 = 2.0525
   \end{eqnarray*}

   Conditions before first backup of 6:
   \begin{eqnarray*}
   C_6 & = &\mbox{immediate cost of step} = 1 \\
   p_{67} & = &\mbox{prob of going right}=0.85 \\
   p_{66} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\
   p_{65} & = &\mbox{prob of going left}=0.05 \\
   v(5)&=&2.0525,v(6)=1,v(7)=1
   \end{eqnarray*}
   First backup of 6:
   \begin{eqnarray*}
   v(6) & = & C_6+\sum_{S'}p_{6S'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 1 + 0.1 \cdot 1 + 0.05 \cdot 2.0525 = 2.05265
   \end{eqnarray*}

   Conditions before first backup of 7:
   \begin{eqnarray*}
   C_7 & = &\mbox{immediate cost of step} = 1 \\
   p_{7G} & = &\mbox{prob of going right}=0.85 \\
   p_{73} & = &\mbox{prob of going up}=0.05 \\
   p_{710} & = &\mbox{prob of going down}=0.05 \\
   p_{76} & = &\mbox{prob of going left}=0.05 \\
   v(3)&=&1,v(6)=2.05265,v(10)=1,v(G)=0
   \end{eqnarray*}
   First backup of 7:
   \begin{eqnarray*}
   v(7) & = & C_7+\sum_{S'}p_{7S'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 0 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05265 = 1.2026
   \end{eqnarray*}

   Conditions before second backup of 6:
   \begin{eqnarray*}
   C_6 & = &\mbox{immediate cost of step} = 1 \\
   p_{67} & = &\mbox{prob of going right}=0.85 \\
   p_{66} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\
   p_{65} & = &\mbox{prob of going left}=0.05 \\
   v(5)&=&2.0525,v(6)=2.05265,v(7)=1.2026
   \end{eqnarray*}
   Second backup of 6:
   \begin{eqnarray*}
   v(6) & = & C_6+\sum_{S'}p_{6S'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 1.2026 + 0.1 \cdot 1 + 0.05 \cdot 2.05265 = 2.329
   \end{eqnarray*}


   Conditions before second backup of 5:
   \begin{eqnarray*}
   C_5 & = &\mbox{immediate cost of step} = 1 \\
   p_{56} & = &\mbox{prob of going right}=0.85 \\
   p_{52} & = &\mbox{prob of going up}=0.05 \\
   p_{59} & = &\mbox{prob of going down}=0.05 \\
   p_{54} & = &\mbox{prob of going left}=0.05 \\
   v(2)&=&1,v(4)=2.05,v(6) = 2.329,v(9)=1
   \end{eqnarray*}
   Second backup of 5:
   \begin{eqnarray*}
   v(5) & = & C_5+\sum_{S'}p_{5S'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 2.329 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05 = 3.18215
   \end{eqnarray*}

   Conditions before second backup of 4:
   \begin{eqnarray*}
   C_4 & = &\mbox{immediate cost of step} = 1 \\
   p_{45} & = &\mbox{prob of going right}=0.85 \\
   p_{44} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\
   p_{4S} & = &\mbox{prob of going left}=0.05 \\
   v(S)&=&2,v(4)=2.05,v(5)=3.18215
   \end{eqnarray*}
   Second backup of 4:
   \begin{eqnarray*}
   v(4) & = & C_4+\sum_{S'}p_{4S'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 3.18215 + 0.1 \cdot 1 + 0.05 \cdot 2 = 4.00
   \end{eqnarray*}

   Conditions before second backup of S:
   \begin{eqnarray*}
   C_S & = &\mbox{immediate cost of step} = 1 \\
   p_{S4} & = &\mbox{prob of going right}=0.85 \\
   p_{S1} & = &\mbox{prob of going up}=0.05 \\
   p_{SS} & = &\mbox{prob of staying put (by trying to go left)}=0.05 \\
   v(S)&=&2,v(1)=1,v(4)=4.00,v(8)=1
   \end{eqnarray*}
   Second backup of S:
   \begin{eqnarray*}
   v(S) & = & C_s+\sum_{S'}p_{SS'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 4.00 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2 = 4.6
   \end{eqnarray*}


   Conditions before second backup of 7:
   \begin{eqnarray*}
   C_7 & = &\mbox{immediate cost of step} = 1 \\
   p_{7G} & = &\mbox{prob of going right}=0.85 \\
   p_{73} & = &\mbox{prob of going up}=0.05 \\
   p_{710} & = &\mbox{prob of going down}=0.05 \\
   p_{76} & = &\mbox{prob of going left}=0.05 \\
   v(3)&=&1,v(6)=2.329,v(10)=1,v(G)=0
   \end{eqnarray*}
   Second backup of 7:
   \begin{eqnarray*}
   v(7) & = & C_7+\sum_{S'}p_{7S'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 0 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.329 = 1.216
   \end{eqnarray*}

   Conditions before first backup of 3:
   \begin{eqnarray*}
   C_3 & = &\mbox{immediate cost of step} = 1 \\
   p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.95 \\
   p_{37} & = &\mbox{prob of going down}=0.05 \\
   v(3)&=&1,v(7)=1.216
   \end{eqnarray*}
   First backup of 3:
   \begin{eqnarray*}
   v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\
     & = & 1 + 0.95 \cdot 1 + 0.05 \cdot 1.216 = 1.9108
   \end{eqnarray*}

   Conditions before second backup of 3:
   \begin{eqnarray*}
   C_3 & = &\mbox{immediate cost of step} = 1 \\
   p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.95 \\
   p_{37} & = &\mbox{prob of going down}=0.05 \\
   v(3)&=&1.9108,v(7)=1.216
   \end{eqnarray*}
   Second backup of 3:
   \begin{eqnarray*}
   v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\
     & = & 1 + 0.95 \cdot 1.9108 + 0.05 \cdot 1.216 = 2.68
   \end{eqnarray*}

Problem 7c:

   Initially, all neighboring states look equally good to the robot, so
   the max of the expected next value doesn't depend on which next
   (valid) state you try to move to. The initial backups S-4-5-6-7-6 are
   the same as before.

   However, when we next try to backup state 5, unexplored states 2 and 9
   still have (incorrect) value estimates of 1, so our policy of going to
   the lowest neighbor leads us astray. Let's arbitrarily say we decide
   to go up (rather than down)

   Conditions before second backup of 5:
   \begin{eqnarray*}
   C_5 & = &\mbox{immediate cost of step} = 1 \\
   p_{56} & = &\mbox{prob of going right}=0.05 \\
   p_{52} & = &\mbox{prob of going up}=0.85 \\
   p_{59} & = &\mbox{prob of going down}=0.05 \\
   p_{54} & = &\mbox{prob of going left}=0.05 \\
   v(2)&=&1,v(4)=2.05,v(6) = 2.329,v(9)=1
   \end{eqnarray*}
   Second backup of 5:
   \begin{eqnarray*}
   v(5) & = & C_5+\sum_{S'}p_{5S'}\cdot v(S') \\
     & = & 1 + 0.05 \cdot 2.329 + 0.85 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05 = 2.122
   \end{eqnarray*}
   We won't correct the erroneous estimates of states 2 and 9 until we actually back {\em them} up.

   Now, due to our underestimates of state S, our policy again points us in the wrong direction. 
   Conditions before second backup of 4:
   \begin{eqnarray*}
   C_4 & = &\mbox{immediate cost of step} = 1 \\
   p_{45} & = &\mbox{prob of going right}=0.05 \\
   p_{44} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\
   p_{4S} & = &\mbox{prob of going left}=0.85 \\
   v(S)&=&2,v(4)=2.05,v(5)=2.122
   \end{eqnarray*}
   Second backup of 4:
   \begin{eqnarray*}
   v(4) & = & C_4+\sum_{S'}p_{4S'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 2 + 0.1 \cdot 2.05 + 0.05 \cdot 2.122 = 3.011
   \end{eqnarray*}

   Conditions before second backup of S:
   \begin{eqnarray*}
   C_S & = &\mbox{immediate cost of step} = 1 \\
   p_{S4} & = &\mbox{prob of going right}=0.05 \\
   p_{S1} & = &\mbox{prob of going up}=0.85 \\
   p_{SS} & = &\mbox{prob of staying put (by trying to go left)}=0.05 \\
   v(S)&=&2,v(1)=1,v(4)=3.011,v(8)=1
   \end{eqnarray*}
   Second backup of S:
   \begin{eqnarray*}
   v(S) & = & C_s+\sum_{S'}p_{SS'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 1 + 0.05 \cdot 2 + 0.05 \cdot 1 + 0.05 \cdot 3.011 = 2.15
   \end{eqnarray*}


   Conditions before second backup of 7:
   \begin{eqnarray*}
   C_7 & = &\mbox{immediate cost of step} = 1 \\
   p_{7G} & = &\mbox{prob of going right}=0.85 \\
   p_{73} & = &\mbox{prob of going up}=0.05 \\
   p_{710} & = &\mbox{prob of going down}=0.05 \\
   p_{76} & = &\mbox{prob of going left}=0.05 \\
   v(3)&=&1,v(6)=2.329,v(10)=1,v(G)=0
   \end{eqnarray*}
   Second backup of 7:
   \begin{eqnarray*}
   v(7) & = & C_7+\sum_{S'}p_{7S'}\cdot v(S') \\
     & = & 1 + 0.85 \cdot 0 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.329 = 1.216
   \end{eqnarray*}

   Conditions before first backup of 3:
   \begin{eqnarray*}
   C_3 & = &\mbox{immediate cost of step} = 1 \\
   p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.15 \\
   p_{37} & = &\mbox{prob of going down}=0.85 \\
   v(3)&=&1,v(7)=1.216
   \end{eqnarray*}
   First backup of 3:
   \begin{eqnarray*}
   v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\
     & = & 1 + 0.15 \cdot 1 + 0.85 \cdot 1.216 = 2.183
   \end{eqnarray*}

   Conditions before second backup of 3:
   \begin{eqnarray*}
   C_3 & = &\mbox{immediate cost of step} = 1 \\
   p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.15 \\
   p_{37} & = &\mbox{prob of going down}=0.85 \\
   v(3)&=&2.183,v(7)=1.216
   \end{eqnarray*}
   Second backup of 3:
   \begin{eqnarray*}
   v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\
     & = & 1 + 0.15 \cdot 2.183 + 0.85 \cdot 1.216 = 2.36
   \end{eqnarray*}

   The take home message on this is that you need to back up everything
   (like states 2 and 9), or your values may never point you in the
   direction you need, and, for a fixed policy, the value function
   converges more quickly than if one's policy is dictated by the current
   value estimates (though both are guaranteed to converge if all states
   are backed up infinitely often.
   \end{document}