It is difficult to answer the question “which bound is tighter?” in a theoretical way, because every bound has a worst case. For example, the Occam’s Razor bound is worse than the Simple bound when the hypothesis chosen happens to be one of the last . Our results show there is no total ordering amongst the bounds although there is a noticeable rough ordering:
This ordering is approximately as expected based on theoretical considerations. The Simple bound can never be much better than Occam Bound and the Occam bound can be arbitrarily tighter than the Simple bound. A similar statement holds for the Microchoice Bound and the Shell bound. The Occam bound is only significantly looser than the Microchoice bound because we used the Hoeffding approximation to the Binomial tail. The Shell bound is not always the best, but it does behave well in comparison to the more standard holdout approach.
It is interesting to note that the sampling shell bound is not better than the Microchoice bound on these learning problems, even with fast sampling techniques. Apparently, the looseness introduced by bounding the sampling error is not countered by the improvement in tightness.
Empirically, we can observe a very noticeable behavior. For problems with less than examples the sample complexity bounds are superior to the holdout bound. Between and examples, the behavior changes with the holdout bound generally winning, although not necessarily by much. Above examples, the holdout bound is significantly and consistently tighter than the sample complexity bounds. This behavior strongly suggests that the sample complexity bounds are loose. Each of these bounds is “tight” in one sense or another, but there may exist some as yet undiscovered observable property prevalent in practical machine learning algorithms which allows us to create a tighter bound. In particular, the problem of correlated hypotheses has yet to be solved in a convincing manner.
Also note that the holdout bound is not the tightest bound we report. In general, we have the following ordering:
The combined bounds seem to have the best behavior in practice.
There are several directions of future investigation which could further strengthen any of these approaches. For the sample complexity approach, it would be useful to address the non-independence of samples in the fast sampling method used for the Sampling Shell bound. We tested the simplest of holdout techniques so another natural extension is to test other holdout techniques. This was not done here, because the theory of these other techniques is lacking.
(Open) Address the looseness introduced by hypotheses with a strong correlation. For example, two decision trees which differ in only one leaf probably don’t have significantly different error rates. Using the union bound over these decision trees introduces unnecessary slack. Note that VC dimension and covering number analysis address this, but (unfortunately) the formulas are either unevaluatable or introduce so much slack that the quantitative results are worse rather than better.