Bayesian Averaging of Classifiers and the Overfitting Problem

Pedro Domingos
Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195

Abstract

Although Bayesian model averaging is theoretically the optimal method for combining learned models, it has seen very little use in machine learning. In this paper we study its application to combining rule sets, and compare it with bagging and partitioning, two popular but more ad-hoc alternatives. Our experiments show that, surprisingly, Bayesian model averaging's error rates are consistently higher than the other methods'. Further investigation shows this to be due to a marked tendency to overfit on the part of Bayesian model averaging, contradicting previous beliefs that it solves (or avoids) the overfitting problem.

A Unified Bias-Variance Decomposition and its Applications

Pedro Domingos
Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195

Abstract

This paper presents a unified bias-variance decomposition that is applicable to squared loss, zero-one loss, variable misclassification costs, and other loss functions. The unified decomposition sheds light on a number of significant issues: the relation between some of the previously proposed decompositions for zero-one loss and the original one for squared loss, the relation between bias, variance and Schapire et al.'s (1997) notion of margin, and the nature of the trade-off between bias and variance in classification. While the bias-variance behavior of zero-one loss and variable misclassification costs is quite different from that of squared loss, this difference derives directly from the different definitions of loss. We have applied the proposed decomposition to decision tree learning, instance based learning and boosting on a large suite of benchmark datasets, and made several significant observations.

Charles Rosenberg

Last modified: Fri Nov 10 15:04:43 EST 2000