next up previous
Next: Bias Versus Variance Up: Discussion Previous: Directions for Future Research

Other Related Research

A number of researchers have developed learning systems that can be viewed as considering evidence from neighboring regions of the instance space in order to derive classifications within regions of the instance space that are not occupied by examples from the training set. Ting [1994] does this explicitly, by examining the training set to directly explore the neighborhood of the object to be classified. This system uses instance based learning for classification within nodes of a decision tree with low empirical support (small disjuncts).

A number of other systems can also be viewed as considering evidence from neighboring regions for classification. These systems learn and then apply multiple classifiers [Ali, Brunk, and Pazzani, 1994, Nock and Gascuel, 1995, Oliver and Hand, 1995]. In such a context, any point within a region of the instance space that is occupied by no training objects is likely to be covered by multiple leaves or rules. Of these, the leaf or rule with the greatest empirical support will be used for classification.

C4.5X uses two distinct criteria for evaluating potential splits. The standard C4.5 stage of tree induction employs an information measure to select splits. The post-processor uses a Laplace accuracy estimate. Similar uses of dual criteria have been investigated elsewhere. Quinlan [1991] employs a Laplace accuracy estimate considering neighboring regions of the instance space to estimate the accuracy of small disjuncts. Lubinsky [1995] and Brodley [1995] employ resubstitution accuracy to select splits near the leaves during induction of decision trees.

By adding a split to a leaf, C4.5X is specializing with respect to the class at that leaf (and generalizing with respect to the class of the new leaf). Holte [1993] explored a number of techniques for specializing small disjuncts. C4.5X differs in that all leaves are candidates for specialization, not just those with low empirical support. It further differs in the manner in which it selects the specialization to perform by considering the evidence in support of alternative splits rather than just the strength of the evidence in support of individual potential conditions for the current disjunct.


next up previous
Next: Bias Versus Variance Up: Discussion Previous: Directions for Future Research

Geoff Webb
Mon Sep 9 12:13:30 EST 1996