Before discussing the main theorem, it is important to notice that the averaging classifier, implies a distribution over the base hypothesis space . This implied distribution is where The distribution is used in the following theorem.
PROOF. Given in the next section. ▫
The main theorem uses a KL-divergence based pseudodistance which is a bit hard to understand intuitively. In order to gain intuition, we can relax the tightness of the proof with an inequality.
This relaxation gives us an immediate corollary.
This theorem improves upon theorem 7.1.1 because is used instead of . For the case of a uniform distribution on different base classifiers, these results will agree when the average is over just one classifier. As the average becomes “broader” the results will improve. In the limit when the average is over nearly all classifiers, the term will be nearly .
The theorems are stated in an asymptotic fashion which is not be very useful in practical applications. Section 7.4 gives some ideas of how to tighten the result, and the non-asymptotic form ( 7.3.15) given at the end of the proof can be used directly in practice.
The improved averaging bound applies to averages over continuous hypothesis spaces. In this setting, the average needs to be an integral over an uncountably-infinite set of hypotheses or the KL-divergence will not converge to a finite value. It is exactly because of this limitation that the improvements of this bound are most applicable to Bayes Optimal and Maximum Entropy classifiers.
In practice, the limitation may not be a significant problem because machine learning algorithms over large hypothesis spaces typically have some parameter stability. In other words, a small shift in the parameters of the learned model produces a small change in the prediction of the hypothesis. With hypothesis stability, we can convert any average over a finite set of hypotheses into an average over an infinite set of hypotheses without significantly altering the predictions of the average. This technique is explored in chapter 13 with positive results.