The lower upper bound 4.4 does not apply to shell bounds because we are using more information than just the empirical error rate of a learned hypothesis. In particular, we are using the empirical error rates of all the hypotheses in calculating the bound. Is there a lower upper bound which applies for the information used by the shell bound? The same independent hypothesis technique will allow us to lower bound the full knowledge theorem 8.1.1. In particular, assume that we are given a set of independent hypotheses, each with some true error . What is a lower bound on the probability that one of these hypotheses will have an empirical error of ?
If and are independent events, then: This implies that we can “add” the independent probabilities together as long as we rescale. In particular, might be the probability of a “bad” hypothesis in some set of hypotheses and might be the probability that some new hypothesis with a large true error rate has a small empirical error.
Using this fact, we get the following theorem:
PROOF. The proof is by finite induction on the set of hypotheses with a large true error rate. Let be the sum of the probabilities that each hypothesis in produces an empirical error of . Now, we want to prove that: This is true for the base case of . Assuming that it is true for the case of , we need to prove it for the case of . In particular, we have assumed Using the earlier independent principle, we get that: By induction, this property therefore holds for the set of all hypotheses with a large true error rate. ▫
Assuming that , the lower upper shell bound is tight to within a factor (in ) of with the full knowledge bound 8.1.1. Given the exponential behavior of Binomial tails, this usually (but not always) implies a small impact on the true error bound. One important question remains: how does this bound compare to the observable shell bound 8.1.2? In the observable shell bound, the distribution of true errors is replaced with a pessimistic distribution based upon the observed empirical errors. The “size” of this pessimism in terms of the true error bound is, in general, of size . Thus, the gap between the lower upper shell bound and the upper shell bound is typically of size .