15-859(B) Machine Learning Theory 02/12/02 * VC-dimension: proving the main result [Note: ignore question 1 on hwk3] ======================================================================== Where we are right now: See Slides Proof of Theorem 2: (Book has a very nice proof. Much simpler than usual one. Will follow it, but simplify further by not introducing notation. Will use the term "bad concept" to mean one with error > epsilon.) Given m examples, define A = event there exists a bad consistent hyp in C. We want to show Pr[A] is low. Define event B as follows. B = event that in 2m examples, there exists a concept in C that is consistent on the first m but has error at least epsilon/2 on the second m. Claim: Pr[A] < 2*Pr[B]. So, if Pr[B] is low then Pr[A] is low. Pf: Say we pick 2m examples. Let A = event there is a bad consistent hyp on first m. Then Pr[B|A] > 1/2 by Chernoff or std dev. (the bad h has true error > epsilon so unlikely it will have observed error < epsilon/2). This means that Pr[B] >= Pr[A]/2. QED (could have actually defined B with epsilon, not epsilon/2, since for epsilon<1/2, median of binomial is <= mean) Now, show Pr[B] is low. (Note: no longer need to talk about "bad" hyps. Reason: for really good hyps, unlikely to satisfy the second part) Re-write event B: Draw set S of 2m examples, and then split randomly into S1, S2 of equal size. B = event that exists hyp in C consistent on S1 but errs on at least epsilon*m/2 points in S2. How to bound this? There are LOTS of concepts in C. But, we'll use the fact that there aren't so many splittings of S using concepts in C. So, let's fix some set S. Can be arbitrary. at most C[2m] splittings Now, fix some splitting h that disagrees with the target on at least epsilon*m/2 points in S. (this is a candidate for causing event B to occur) Want to show that Pr[h is consistent on S1] is small. (Probability is over the splitting of S into S1 and S2) Then will apply union bound over the at most C[2m] <= ((2m d)) such splittings. Lemma: Suppose h disagrees with target on at least L points in S. Then, Pr[h is consistent on S1] <= 1/2^L. Intuition: Suppose we split S by putting each point randomly into S1 or S2 (i.e., didn't require equal size), then Pr[all the L points fall into S2] = 1/2^L. Turns out that the experiment we care about is only more helpful to us. E.g., if L = m+1 then we get Pr[...] = 0. Proof: prob[all L in S2] = (m L)/(2m L) Reason: instead of fixing the L points and randomly splitting into equal-sized S1, S2, we can equivalently think of as fixing the split and then picking the L points randomly. Now, just solve... < 1/2^L Now, we apply union bound: Pr[B] <= C[2m] * (1/2)^{epsilon*m/2} want < delta/2. That's it. Gives us Theorem 2. ========================================================================== How about Theorem 2'? Just need to redo proof using Hoeffding: Draw 2m examples A = event that on 1st m, there exists hyp in C with empirical and true error that differ by at least epsilon. (This is what we want to bound for the theorem). B = event that there exists a concept in C whose empirical error on 1st half differs from empirical error on 2nd half by at least epsilon/2. Pr[B|A] >= 1/2 so Pr[A] <= 2*Pr[B]. Now, show Pr[B] is low. As before, let's first pick S, then randomly decide what is 1st half and what is 2nd half. Once S is determined, there are only C[2m] h's we need to worry about. Now, fix some h, making L mistakes overall on S. The bad event is that >= L/2 + epsilon*m/2 fall into S1 (or S2). Let's pretend we were iid putting each point into S1 or S2 (rather than splitting them up exactly 50/50). This can only increase the chance of the bad event (handwaving but true). So, can think of as just: we flip L fair coins, and we want to bound the chance of having epsilon*m/2 more heads than tails or vice versa. Let's handwave that the worst case is when L=m. Using Hoeffding, this is at most 2e^{-epsilon^2 m / 2}. Set to delta/C[2m] and we're done. ========================================================================== Talk about projects and presentations