The motivation behind progressive validation is that it allows one to train on more examples than the hold-out estimate. With the extra examples training algorithms should be able to choose a better hypothesis. Many learning problems exhibit thresholding where a small increase in the number of examples dramatically improves the accuracy of the hypothesis. Consider an dimensional feature space in the boolean setting where it is known that one feature is an exact predictor. Consider the learning algorithm: cross off features inconsistent with the training data and output the hypothesis that takes a majority vote over all features remaining. If the example distribution is uniform over , then this example exhibits a thresholding behavior because the accuracy of the current hypothesis is almost 50% until the number of consistent features is reduced to a constant, at which point it quickly increases to 100%. In expectation, of the features will be eliminated with each example, leading us to expect a threshold near .
In our experiments, we built a synthetic data generator which picks a feature uniformly at random then produces some number of correctly-labeled examples consisting of boolean features, with . The output of this generator was given to the learning algorithm.
In the first test, we trained on examples and tested on examples. In the second test, we trained on examples and applied progressive validation to the next examples. We repeated this experiment 1000 times for and averaged the results in order to get an empirical estimate of the true error of all hypotheses produced, shown in Figure 10.4.1.
As expected, the hold-out’s performance was much worse than that of progressive validation. In general, the degree of improvement in empirical error due to the progressive validation depends on the learning algorithm. The improvement can be large if the data set is small or the learning problem exhibits thresholding behavior at some point past the number of training examples.
In order to compare the quality of error estimation, we did another set of runs calculating the error discrepancy true errorestimated error. Five training examples were used followed by either progressive validation on ten examples or evaluation on a hold-out set of size ten. The “true error” was calculated empirically by evaluating the resulting hypothesis for each case on another hold-out set of examples. The hold-out estimate on five examples has larger variance then the progressive validation estimate. One might suspect that this is not due to a good estimation procedure but due to the fact that it is easier to estimate a lower error. To investigate this further, we performed a hold-out test which was trained on nine examples, because the true error of the progressive validation hypothesis with five training examples and ten progressive validation examples was close to the true error of a hypothesis trained on nine examples, as shown in the following table:
true error | ||
Prog. Val. | ||
Hold-out | ||
Hold-out | ||
Averages of the true error and estimate accuracy favor progressive validation in this experiment with a hold-out set of size 10. In fact, the progressive estimate and hypothesis on a data set of size 15 were better than the hold-out estimate and hypothesis on a data set of size 19.