We have two forms of bound, one which uses training set errors and one which uses holdout set errors. The obvious question to ask is: can we combine the information from both bounds? Presumably, if we use both the training error and the test error, we should be able to construct a better confidence interval for the location of the true error rate.
Viewed as an interactive proof of learning (see figure 11.1.1), the train and test approach will just add an extra testing phase to every training set based bound.
Given a fixed hypothesis and learning problem, we know that the test error will be Binomially distributed. Given a fixed learning algorithm and learning problem, the training error will have a considerably more complicated distribution. We can nonetheless, regard the training error as a fixed random variable which has some cumulative distribution parameterized by many parameters, one of which is the true error rate of the output hypothesis (which is itself a random variable).
How can we construct a confidence interval based upon information from both the training and testing sets? There are several possibilities.
Technique (1) can be seen visually by graphing training error vs test error and marking the regions that are bounded away.
The essential problem with technique (1) is that the resulting true error bound takes the maximum (minus a small amount) of the bounds based upon both the test set and the training set. Given that we don’t trust either bound to always return tight information, we expect the maximum will not behave well.
Technique (2) can be seen visually in a similar way:
Technique (2) works moderately well. Mathematically, we can calculate the minimum of the two error bounds and add a small amount. This approach is equivalent to taking a union bound. While this approach allows us to combine the bounds, it does not let us achieve an improvement over either which is intuitively possible. Certainly, if we use two test sets, we expect to construct improved confidence intervals.
A better approach may be possible. We would like to construct a rejection region of the following form:
Such a rejection region has two important properties:
Showing that technique (2) works is just an application of the union bound. Given any two bounds on the true error rate, we can apportion confidence to each bound. Then both bounds will hold with probability which implies that the smaller of the two true error bounds holds.