next up previous
Next: Minimum Encoding Length Induction Up: Discussion Previous: Other Related Research

Bias Versus Variance

Breiman, Friedman, Olshen, and Stone [1984] provide an analysis of complexity and induction in terms of a trade-off between bias and variance. A classifier partitions the instance space into regions. When these regions are too large, the degree of fit to an accurate partitioning of the instance space will be poor, increasing error rates. This effect is called bias. When the regions are too small, the probability that individual regions are labeled with the wrong class is increased. This effect, called variance, also increases error rates. According to this analysis, due to variance, too fine a partitioning of the instance space tends to increase the error rate while, due to bias, too coarse a partitioning also tends to increase the error rate.

Increasing the complexity of a decision tree creates finer partitionings of the instance space. This analysis can be used to argue against the addition of undue complexity to decision trees on the ground that it will increase variance and hence the error rate.

However, the success of C4.5X in decreasing the error rate demonstrates that it is successfully managing the bias/variance trade-off when it introduces complexity to the decision tree. By using evidence from neighboring regions of the instance space, C4.5X is successful in increasing the error rate resulting from variance at a lower rate than it decreases the error rate resulting from bias. The success of C4.5X demonstrates that it is not adding undue complexity to C4.5's decision trees.

Geoff Webb
Mon Sep 9 12:13:30 EST 1996