Traditional PAC model bounds provide an elegant theory which bounds the number of examples required in order to guarantee that the emperical error is near to the expected error of new examples drawn from some (unknown) distribution over examples. Unfortunately, the PAC model bound is so loose that it is useless in practice.
Deriving tighter bounds for learning algorithms can have a significant impact on the way in which machine learning is done. Ideally, we'd like to calculate a PAC model type bound which was tight enough to use inside the program so as to causing halting before overfitting.
The state of the art in PAC model bounds is relatively weak. The PAC bounds are too weak to be quantitatively useful. Current approaches involve using a hold out set to estimate the amount of overfit. This approach is unsatisfactory because data which could be used for training is not. Another common approach is ``cross validation'' which is essentially the holdout process run many times with a different set held out each time. From these many runs, an estimate of when overfitting occurs is made, and then the learning algorithm is applied once more to the full dataset stopping at the point where overfitting would occur. Cross validation is unsatisfactory because it is compute intensive and there is not yet a strong theoretical justification for the process.
There are two techniques which I employ in reducing the bound. The
first is called Microchoice bounds which starte with the observation
that many learning algorithms proceed with a sequence of small choices
to arrive at a final choice. A chernoff bound on the probability that
the emprical loss of a hypothesis differs from the expected loss under
the distribution producing examples applies for all
By tuning (a. priori but not explicitly) these
hypothesis, we can arrive at a bound which is calculatable
a. posteriori by looking at the trace of this running program.
The second approach involves the observation that more than just the empirical error of the lowest error hypothesis is available. Instead we also can easily discover the empirical error of other hypotheses and perhaps even sample uniformly from the space of all uniform hypotheses. Given the knowledge or partial knowledge of the distribution of empirical errors in the hypothesis space it is possible to state a bound which takes into account the fact that the pattern of observed empirical errors indicates that the distribution of true errors is not pathological. This can radically improve the tightness of the PAC bound.
This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -debug -white -no_fork -no_navigation -address -split 0 -dir html -external_file john_1 john_1.tex.
The translation was initiated by Daniel Nikovski on 2000-04-28