1a: MLE argmax_h P(data|h) --- what makes the data most likely MAP argmax_h P(h|data) --- what hypothesis does the data make most likely; requires prior on h 1b: VC(H) = |X| 1c (i): Z / \ / \ V V X Y 1c(ii) 5 params 1c(iii) 7 params 1d : FALSE: estimate is optimistic because it might not be paying for training set noise 1e: TRUE 1f: TRUE (though not covered in 2003) 1g: FALSE: Both can hit local minima 1h: FALSE: MDPs have fully observed state whereas HMMs have observation symbols that are stochastically dependent on state 2a Initialize with if [Attribute value 1] then class Taking the next example... If it is correctly classified do nothing Otherwise pace at the root one of the attributes with the corresponding classification 2b (Assuming lists are of depth d) 4^d * k! / (k - d)! 2c m >= 1/eps ( log |H| + log (1/delta) ) with eps = 0.1, delta = 0.1 2d |H| = 2^k and m >= 10 ( k log 2 + log 10 ) 3a The centroids found by k-means can get pushed away from each other so the are further apart than the true means that generated the data. 3d One difference is that GMM elements may be long and thin 4a 0.8 4b 0.18 4c 0.44 4d 0.2 (Since P(yell) is 0.2 no matter what the state) 4e HAAAA 5a: A and C 5b: P( A ^ B | C ) = 1/8 P( A ^ B | ~C ) = 2/8 P(~A ^ B | C ) = 4/8 P(~A ^ B | ~C ) = 1/8 P( A ^~B | C ) = 0 P( A ^~B | ~C ) = 5/8 P(~A ^~B | C ) = 3/8 P(~A ^~B | ~C ) = 0 P(C) = 1/2 P(~C) = 1/2 5c: P(C) = 1/2 P(~C) = 1/2 P(A|C) = 1/8 P(A|~C) = 7/8 P(B|C) = 5/8 P(B|~C) = 3/8 [ P(~A|..) and P(~B|..) defined implicitly] 5d: C=1 5e: C=1 6b: w = (0,2) b = -5 6c: It's possible - - - - - - - - + + + + x=-1 x=0 x=+1 |------------------| For small C would choose this margin 7a: 3 7b: 30/7 7c: 18/4 7d: 18/4 7e: Yes. TRAINING set error is zero because each point is closest to itself 7f: Same answer 8a: 0 8b: 3/8 8c: 1/4 8d: 1/4 8e: 1/4 8f: 0 9a: No arcs 9b: A ---> C <---- B 9c: Full connection 10,11: Am trying to find the figure that goes with this question!!