Breiman, Friedman, Olshen, and Stone [1984] provide an analysis of complexity and
induction in terms of a trade-off between *bias* and *
variance*. A classifier partitions the instance space
into regions. When these regions are too large, the degree of fit to an
accurate partitioning of the instance space will be poor, increasing
error rates. This effect is called *bias*. When the regions are
too small, the probability that individual regions are labeled with the
wrong class is increased. This effect, called *variance*, also
increases error rates. According to this analysis, due to
variance, too fine a partitioning of the instance space tends to increase
the error rate while, due to bias, too coarse a partitioning also tends
to increase the error rate.

Increasing the complexity of a decision tree creates finer partitionings of the instance space. This analysis can be used to argue against the addition of undue complexity to decision trees on the ground that it will increase variance and hence the error rate.

However, the success of C4.5X in decreasing the error rate
demonstrates that it is successfully managing the bias/variance
trade-off when it introduces complexity to the decision tree.
By using evidence from neighboring regions of the instance space, C4.5X
is successful in increasing the error rate resulting from variance at a
lower rate than it decreases the error rate resulting from bias.
The success of C4.5X demonstrates that it is not adding *undue*
complexity to C4.5's decision trees.

Mon Sep 9 12:13:30 EST 1996