Most machine learning systems explicitly or implicitly employ Occam's razor. In addition to its almost universal use in machine learning, the principle of Occam's razor is widely accepted in general scientific practice. That this has persisted, despite Occam's razor being subjected to extensive philosophical, theoretical and empirical attack, suggests that these attacks have not been found persuasive.
On the philosophical front, to summarize Bunge , the complexity of a theory (classifier) depends entirely upon the language in which it is encoded. To claim that the acceptability of a theory depends upon the language in which it happens to be expressed appears indefensible. Further, there is no obvious theoretical relationship between syntactic complexity and the quality of a theory, other than the possibility that the world is intrinsically simple and that the use of Occam's razor enables the discovery of that intrinsic simplicity. However, even if the world is intrinsically simple, there is no reason why that simplicity should correspond to syntactic simplicity in an arbitrary language.
To merely state that a less complex explanation is preferable does not specify by what criterion it is preferable. The implicit assumption underlying much machine learning research appears to be that, all other things being equal, less complex classifiers will be, in general, more accurate [Blumer, Ehrenfeucht, Haussler, and Warmuth, 1987, Quinlan, 1986]. It is this Occam thesis that this paper seeks to discredit.
On a straight-forward interpretation, for a syntactic measure to be used to predict expected accuracy appears absurd. If two classifiers have identical meaning (such as IF 20<=AGE<=40 THEN POS and IF 20<=AGE<=30 OR 30<=AGE<=40 THEN POS) then it is not possible for their accuracies to differ, no matter how greatly their complexities differ. This simple example highlights the apparent dominance of semantics over syntax in the determination of predictive accuracy.