Previous Experimental Evidence Against the Occam Thesis

Next: Theoretical and Experimental Support Up: Previous Theoretical and Experimental Previous: Other Theoretical Objections to

Previous Experimental Evidence Against the Occam Thesis

On the empirical front, a number of recent experimental results have appeared to conflict with the Occam thesis. Murphy and Pazzani [1994] demonstrated that for a number of artificial classification learning tasks, the simplest consistent decision trees had lower predictive accuracy than slightly more complex consistent trees. Further experimentation, however, showed that these results were dependent upon the complexity of the target concept. A bias toward simplicity performed well when the target concept was best described by a simple classifier and a bias toward complexity performed well when the target concept was best described by a complex classifier [Murphy, 1995]. In addition, the simplest classifiers obtained better than average (over all consistent classifiers) predictive accuracy when the data was augmented with irrelevant attributes or attributes strongly correlated to the target concept, but not required for classification.

Webb [1994] presented results that suggest that for a wide range of learning tasks from the UCI repository of learning tasks [Murphy and Aha, 1993], the relative generality of the classifiers is a better predictor of classification performance than is the relative surface syntactic complexity. However, it could be argued that while these results demonstrate that a strategy of selecting the simplest between any pair of theories will not lead to maximization of predictive accuracy, they do not demonstrate that selecting the simplest of all available theories would fail to maximize predictive accuracy.

Schaffer [1993] has shown that pruning techniques that reduce complexity while decreasing resubstitution accuracy sometimes increase predictive accuracy and sometimes decrease predictive accuracy of inferred decision trees. However, a proponent of the Occam thesis could explain these results in terms of a positive effect from the application of Occam's razor (the reduction of complexity) being counter-balanced by a negative effect from a reduction of empirical support (resubstitution accuracy).

Holte, Acker, and Porter [1989] have shown that specializing small disjuncts (rules with low empirical support) to exclude areas of the instance space occupied by no training objects frequently decreases the error rate of unseen objects covered by those disjuncts. As this specialization involves increasing complexity, this might be viewed as contrary to the Occam thesis. However, the same research shows that the total error rates for the classifiers in which the disjuncts are embedded increases when those disjuncts are specialized. A proponent of the Occam thesis could thus dismiss the relevance of the former results by arguing that the thesis only applies to complete classifiers and not to elements of those classifiers.

Next: Theoretical and Experimental Support Up: Previous Theoretical and Experimental Previous: Other Theoretical Objections to

Geoff Webb
Mon Sep 9 12:13:30 EST 1996