15-381 Spring 200 Final Exam Answers Problem 1a: (5 points) full cr. E(from i=0 to d/2)b^i + E(from i=0 to d/2)c^i 3 pts. O(b^(d/2) + c^(d/2)) 0 pts if d/2 is not the exponent Problem 1b: (5 points) Unifiable. Unifier = { y/A , x/h(A,B) } 3 points for unified expression f( h( A , B ) , g( A , A , h( A,B ) ) ) 0 points if "not unifiable" Problem 1c: (5 points) The correct answer here was O(wd + q). The wd term comes from multiplying the query matrix which is size w * d by the query vector. The term q comes from calculating the size of the query vector. Problem 1d: (5 points) The correct answer here is: argmax P(wordi | signal) = argmax P(signal | wordi) * P(wordi) / P(signal) But since P(signal) is constant over all words, we just need to calculate: argmax P(signal | wordi) * P(wordi) Problem 2: NLP (15 points) a) Any task description where a method more complex than bag-of-words but less than full parsing will do, so long as it is clear that said method applies and is beneficial, got full credit, for instance: - Information extraction (detecting propoer names, places, dates, monterary amounts, etc.). This requires finite state automata (or simpler), essentially k-word pattern matching, where k = 2 to 6 (less than length of sentence). - Summarization (producing a short version of the text), where sentence segmentation (punctuation, capitalization, etc.) is required on top of b-of-w to find relevant senteces. Here Info extraction helps too (as subroutine) to select sentences mentioning interesting names, places, but it is not required for base-level summarization - Domain specific command interpretation (e.g. "moveAs postscriptto ") which can be done by template matching, but does require word order constraints to be respected. Wrong answers included IR, web-based IR, translation, question-answering, and anything at either extreme, as stated in the problem description itself. b) Any response that requires the construction of all 10^15 trigrams is wrong, regardless of any clever data structure (such as prefix trees) to hold it, because they still need 10^15 distinct leaves. Partial credit for simply noting that only a small fragment of the possible trigrams will occur in English, so simply adding a trigram when it is first seen results in less than 10^15, as the majority will never be seen. Full credit for any correct incremental scheme that only builds meaningful triagrams in the first place. The best answer is: Step 1: look at all the bi-grams that occur in any large training corpus (this is significantly less than 10^10). Then, keep only those that both occur more than once and whose probability of occurence is far above chance, i.e. p(w_1,w_2) >> p(w_1)p(w_2) Tune the >> theshold so that the top 10^5 to 10^6 are kept. Step 2: repeat above using bigram-unigram seqences from the bigrams kept in step 1 to construct trigrams, and keep those that occur more than once and: p(w_1,w_2,w_3) >> p(w_1,w_2)p(w_3) Again tune the >> theshold so that the top 10^5 to 10^6 are kept. In total, then the dictionary + bigrams + trigrams will be O(10^6) rather than O(10^15). If desired for an application where memory is ample, you can go to O(10^7) or any size required, but with diminishing returns as lower-frequency or less-meaninfgul n-grams are kept. Problem 3: Probability (15 points) a) The answer we were looking for was "conditional independence", from which p(LATE) falls out immediately. It turns out that there were many correct answers. However, some incorrect tries included: 1) Asking for two pieces of information (got partial credit if one of the two was correct) 2) Asking for the value of one of the variables (doesn't help us get p(LATE)) b) This was kind of tricky. Given conditional independence, knowing LATE does not give us any additional information about the value of DAY (i.e. p(DAY=Tues) = p(DAY=Tues). However, most people did not select conditional independence as their piece of information, so answers varied depending on what answer was given to part (a). If you didn't answer conditional independence, then p(DAY) = p(DAY) is -not- correct, as knowing LATE then does give information about the value of DAY. Problem 4a: (5 points) If the data is linearly separable you could use linear regression, a single perceptron, or a neural network with no hidden units. (There are also many other possibilities.) If the data is not linearly separable you could use a decision tree or a neural network with hidden units. (There are also many other possibilities.) Problem 4b: (5 points) You would want to randomly split up your data into a training set and a test set. Typically 30% of your data would be in your test set. You would then train your model using the training set data only. The testing set data would only be used to test the model and predict the future performance of your model. If you were trying to choose between multiple models, you could use the performance on the test set to decide which model is likely to perform best. Problem 4c: (5 points) Entropy(no split) = -[300/900 log(300/900) + 600/900 log(600/900)] =~ 0.918 Entropy(f present) = -[250/300 log(250/300) + 50/300 log( 50/300)] =~ 0.650 Entropy(f absent) = -[ 50/600 log( 50/600) + 550/600 log(550/600)] =~ 0.414 IG(f) = E(no split) - [300/900 E(f present) + 600/900 E(f absent)] =~ 0.426 Entropy(g present) = 0 Entropy(g absent) = -[260/860 log(260/860) + 600/860 log(600/860)] =~ 0.884 IG(g) = E(no split) - [ 40/900 E(g present) + 860/900 E(g absent)] =~ 0.073 So f is the better choice since it has higher information gain. (Note: All log's are base 2.) Problem 4d: (5 points) d) The simplest (and also very robust) way is to let ? be the most common value for feature f. We cannot just ignore every piece of data with a ?, because it might be that every sample has some uncertainty associated with it, and ignoring would leave us with no samples. Partial credit was given for adding a ? branch to the tree at each split, because the possible lack of data samples with a ? for a given feature means that the resulting decision tree might predict very poorly on data with ?'s, when it would have a better chance going with a majority approach. Problem 5, Neural Networks (5 points) 1. Legal 2. Legal 3. Illegal 4. Legal 5. Accept both 6. Accept both 7. Accept both Problem 6: MDPs (15pts) 1. State space = Floor that each elevator is in, Time since each button was pushed, State of button (ie. going up, going down). Actions = Go up, go down, stay still Reward = -1 * sum of times since each button was pushed Note this setup minimizes the waiting time for passengers getting on to the elevator but does not do anything once passengers have gotten on. It also assumes that the buttons being pushed correspond to passengers on a certain floor and that all passengers on a floor that are going in one direction all get on the elevator at that time. 2. Yes, it is Markovian since the reward and probabilities depend only on the current state. 3. Q-learning is a good algorithm except that the training time might take a while. Problem 7: RL (15 points)

Or as tex: Problem 7a: Some people got philosophical on this one, asking what I meant by "eventually." Formally, my intent was "with probability 1 as time goes to infinity," in which case the answer was yes, the robot will eventually make it to the goal state. Regardless of the policy followed, the randomness in its execution (0.05 probability of moving in one of the three directions other than intended) ensures that, with probability 1, it will eventually take enough moves in the right direction to get out of any dead ends. In most cases, I gave partial credit to reasonable, but differing interpretations of "always" and "eventually". Problem 7b: Begin with open loop policy of "always go right". Given the noise model, this means that the robot will go right 85\% of the time, up 5\% of the time, down 5\% of the time, and left 5\% of the time. If there's a wall in the direction it attempts to go, it will find itself back in the same state. Initially, assume "cost to go" (which we'll write as $v()$) from each state (except G) is 1. \\ Conditions before first backup of S: \begin{eqnarray*} C_S & = &\mbox{immediate cost of step} = 1 \\ p_{S4} & = &\mbox{prob of going right}=0.85 \\ p_{S1} & = &\mbox{prob of going up}=0.05 \\ p_{SS} & = &\mbox{prob of staying put (by trying to go left)}=0.05 \\ v(S)&=&1,v(1)=1,v(4)=1,v(8)=1 \end{eqnarray*} First backup of S: \begin{eqnarray*} v(S) & = & C_s+\sum_{S'}p_{SS'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 1 = 2 \end{eqnarray*} Conditions before first backup of 4: \begin{eqnarray*} C_4 & = &\mbox{immediate cost of step} = 1 \\ p_{45} & = &\mbox{prob of going right}=0.85 \\ p_{44} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\ p_{4S} & = &\mbox{prob of going left}=0.05 \\ v()&=&2,v(4)=1,v(5)=1 \end{eqnarray*} First backup of 4: \begin{eqnarray*} v(4) & = & C_4+\sum_{S'}p_{4S'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 1 + 0.1 \cdot 1 + 0.05 \cdot 2 = 2.05 \end{eqnarray*} Conditions before first backup of 5: \begin{eqnarray*} C_5 & = &\mbox{immediate cost of step} = 1 \\ p_{56} & = &\mbox{prob of going right}=0.85 \\ p_{52} & = &\mbox{prob of going up}=0.05 \\ p_{59} & = &\mbox{prob of going down}=0.05 \\ p_{54} & = &\mbox{prob of going left}=0.05 \\ v(2)&=&1,v(4)=2.05,v(5)=1,v(9)=1 \end{eqnarray*} First backup of 5: \begin{eqnarray*} v(5) & = & C_5+\sum_{S'}p_{5S'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05 = 2.0525 \end{eqnarray*} Conditions before first backup of 6: \begin{eqnarray*} C_6 & = &\mbox{immediate cost of step} = 1 \\ p_{67} & = &\mbox{prob of going right}=0.85 \\ p_{66} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\ p_{65} & = &\mbox{prob of going left}=0.05 \\ v(5)&=&2.0525,v(6)=1,v(7)=1 \end{eqnarray*} First backup of 6: \begin{eqnarray*} v(6) & = & C_6+\sum_{S'}p_{6S'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 1 + 0.1 \cdot 1 + 0.05 \cdot 2.0525 = 2.05265 \end{eqnarray*} Conditions before first backup of 7: \begin{eqnarray*} C_7 & = &\mbox{immediate cost of step} = 1 \\ p_{7G} & = &\mbox{prob of going right}=0.85 \\ p_{73} & = &\mbox{prob of going up}=0.05 \\ p_{710} & = &\mbox{prob of going down}=0.05 \\ p_{76} & = &\mbox{prob of going left}=0.05 \\ v(3)&=&1,v(6)=2.05265,v(10)=1,v(G)=0 \end{eqnarray*} First backup of 7: \begin{eqnarray*} v(7) & = & C_7+\sum_{S'}p_{7S'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 0 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05265 = 1.2026 \end{eqnarray*} Conditions before second backup of 6: \begin{eqnarray*} C_6 & = &\mbox{immediate cost of step} = 1 \\ p_{67} & = &\mbox{prob of going right}=0.85 \\ p_{66} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\ p_{65} & = &\mbox{prob of going left}=0.05 \\ v(5)&=&2.0525,v(6)=2.05265,v(7)=1.2026 \end{eqnarray*} Second backup of 6: \begin{eqnarray*} v(6) & = & C_6+\sum_{S'}p_{6S'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 1.2026 + 0.1 \cdot 1 + 0.05 \cdot 2.05265 = 2.329 \end{eqnarray*} Conditions before second backup of 5: \begin{eqnarray*} C_5 & = &\mbox{immediate cost of step} = 1 \\ p_{56} & = &\mbox{prob of going right}=0.85 \\ p_{52} & = &\mbox{prob of going up}=0.05 \\ p_{59} & = &\mbox{prob of going down}=0.05 \\ p_{54} & = &\mbox{prob of going left}=0.05 \\ v(2)&=&1,v(4)=2.05,v(6) = 2.329,v(9)=1 \end{eqnarray*} Second backup of 5: \begin{eqnarray*} v(5) & = & C_5+\sum_{S'}p_{5S'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 2.329 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05 = 3.18215 \end{eqnarray*} Conditions before second backup of 4: \begin{eqnarray*} C_4 & = &\mbox{immediate cost of step} = 1 \\ p_{45} & = &\mbox{prob of going right}=0.85 \\ p_{44} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\ p_{4S} & = &\mbox{prob of going left}=0.05 \\ v(S)&=&2,v(4)=2.05,v(5)=3.18215 \end{eqnarray*} Second backup of 4: \begin{eqnarray*} v(4) & = & C_4+\sum_{S'}p_{4S'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 3.18215 + 0.1 \cdot 1 + 0.05 \cdot 2 = 4.00 \end{eqnarray*} Conditions before second backup of S: \begin{eqnarray*} C_S & = &\mbox{immediate cost of step} = 1 \\ p_{S4} & = &\mbox{prob of going right}=0.85 \\ p_{S1} & = &\mbox{prob of going up}=0.05 \\ p_{SS} & = &\mbox{prob of staying put (by trying to go left)}=0.05 \\ v(S)&=&2,v(1)=1,v(4)=4.00,v(8)=1 \end{eqnarray*} Second backup of S: \begin{eqnarray*} v(S) & = & C_s+\sum_{S'}p_{SS'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 4.00 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2 = 4.6 \end{eqnarray*} Conditions before second backup of 7: \begin{eqnarray*} C_7 & = &\mbox{immediate cost of step} = 1 \\ p_{7G} & = &\mbox{prob of going right}=0.85 \\ p_{73} & = &\mbox{prob of going up}=0.05 \\ p_{710} & = &\mbox{prob of going down}=0.05 \\ p_{76} & = &\mbox{prob of going left}=0.05 \\ v(3)&=&1,v(6)=2.329,v(10)=1,v(G)=0 \end{eqnarray*} Second backup of 7: \begin{eqnarray*} v(7) & = & C_7+\sum_{S'}p_{7S'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 0 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.329 = 1.216 \end{eqnarray*} Conditions before first backup of 3: \begin{eqnarray*} C_3 & = &\mbox{immediate cost of step} = 1 \\ p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.95 \\ p_{37} & = &\mbox{prob of going down}=0.05 \\ v(3)&=&1,v(7)=1.216 \end{eqnarray*} First backup of 3: \begin{eqnarray*} v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\ & = & 1 + 0.95 \cdot 1 + 0.05 \cdot 1.216 = 1.9108 \end{eqnarray*} Conditions before second backup of 3: \begin{eqnarray*} C_3 & = &\mbox{immediate cost of step} = 1 \\ p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.95 \\ p_{37} & = &\mbox{prob of going down}=0.05 \\ v(3)&=&1.9108,v(7)=1.216 \end{eqnarray*} Second backup of 3: \begin{eqnarray*} v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\ & = & 1 + 0.95 \cdot 1.9108 + 0.05 \cdot 1.216 = 2.68 \end{eqnarray*} Problem 7c: Initially, all neighboring states look equally good to the robot, so the max of the expected next value doesn't depend on which next (valid) state you try to move to. The initial backups S-4-5-6-7-6 are the same as before. However, when we next try to backup state 5, unexplored states 2 and 9 still have (incorrect) value estimates of 1, so our policy of going to the lowest neighbor leads us astray. Let's arbitrarily say we decide to go up (rather than down) Conditions before second backup of 5: \begin{eqnarray*} C_5 & = &\mbox{immediate cost of step} = 1 \\ p_{56} & = &\mbox{prob of going right}=0.05 \\ p_{52} & = &\mbox{prob of going up}=0.85 \\ p_{59} & = &\mbox{prob of going down}=0.05 \\ p_{54} & = &\mbox{prob of going left}=0.05 \\ v(2)&=&1,v(4)=2.05,v(6) = 2.329,v(9)=1 \end{eqnarray*} Second backup of 5: \begin{eqnarray*} v(5) & = & C_5+\sum_{S'}p_{5S'}\cdot v(S') \\ & = & 1 + 0.05 \cdot 2.329 + 0.85 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.05 = 2.122 \end{eqnarray*} We won't correct the erroneous estimates of states 2 and 9 until we actually back {\em them} up. Now, due to our underestimates of state S, our policy again points us in the wrong direction. Conditions before second backup of 4: \begin{eqnarray*} C_4 & = &\mbox{immediate cost of step} = 1 \\ p_{45} & = &\mbox{prob of going right}=0.05 \\ p_{44} & = &\mbox{prob of staying put (trying up or down)}=0.1 \\ p_{4S} & = &\mbox{prob of going left}=0.85 \\ v(S)&=&2,v(4)=2.05,v(5)=2.122 \end{eqnarray*} Second backup of 4: \begin{eqnarray*} v(4) & = & C_4+\sum_{S'}p_{4S'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 2 + 0.1 \cdot 2.05 + 0.05 \cdot 2.122 = 3.011 \end{eqnarray*} Conditions before second backup of S: \begin{eqnarray*} C_S & = &\mbox{immediate cost of step} = 1 \\ p_{S4} & = &\mbox{prob of going right}=0.05 \\ p_{S1} & = &\mbox{prob of going up}=0.85 \\ p_{SS} & = &\mbox{prob of staying put (by trying to go left)}=0.05 \\ v(S)&=&2,v(1)=1,v(4)=3.011,v(8)=1 \end{eqnarray*} Second backup of S: \begin{eqnarray*} v(S) & = & C_s+\sum_{S'}p_{SS'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 1 + 0.05 \cdot 2 + 0.05 \cdot 1 + 0.05 \cdot 3.011 = 2.15 \end{eqnarray*} Conditions before second backup of 7: \begin{eqnarray*} C_7 & = &\mbox{immediate cost of step} = 1 \\ p_{7G} & = &\mbox{prob of going right}=0.85 \\ p_{73} & = &\mbox{prob of going up}=0.05 \\ p_{710} & = &\mbox{prob of going down}=0.05 \\ p_{76} & = &\mbox{prob of going left}=0.05 \\ v(3)&=&1,v(6)=2.329,v(10)=1,v(G)=0 \end{eqnarray*} Second backup of 7: \begin{eqnarray*} v(7) & = & C_7+\sum_{S'}p_{7S'}\cdot v(S') \\ & = & 1 + 0.85 \cdot 0 + 0.05 \cdot 1 + 0.05 \cdot 1 + 0.05 \cdot 2.329 = 1.216 \end{eqnarray*} Conditions before first backup of 3: \begin{eqnarray*} C_3 & = &\mbox{immediate cost of step} = 1 \\ p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.15 \\ p_{37} & = &\mbox{prob of going down}=0.85 \\ v(3)&=&1,v(7)=1.216 \end{eqnarray*} First backup of 3: \begin{eqnarray*} v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\ & = & 1 + 0.15 \cdot 1 + 0.85 \cdot 1.216 = 2.183 \end{eqnarray*} Conditions before second backup of 3: \begin{eqnarray*} C_3 & = &\mbox{immediate cost of step} = 1 \\ p_{33} & = &\mbox{prob of staying put (by trying left, right or up)}=0.15 \\ p_{37} & = &\mbox{prob of going down}=0.85 \\ v(3)&=&2.183,v(7)=1.216 \end{eqnarray*} Second backup of 3: \begin{eqnarray*} v(3) & = & C_3+\sum_{S'}p_{3S'}\cdot v(S') \\ & = & 1 + 0.15 \cdot 2.183 + 0.85 \cdot 1.216 = 2.36 \end{eqnarray*} The take home message on this is that you need to back up everything (like states 2 and 9), or your values may never point you in the direction you need, and, for a fixed policy, the value function converges more quickly than if one's policy is dictated by the current value estimates (though both are guaranteed to converge if all states are backed up infinitely often. \end{document}