# Bayesian Averaging of Classifiers and the Overfitting Problem

Pedro Domingos

Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195

### Abstract

Although Bayesian model averaging is theoretically the optimal method
for combining learned models, it has seen very little use in machine
learning. In this paper we study its application to combining rule
sets, and compare it with bagging and partitioning, two popular but
more ad-hoc alternatives. Our experiments show that, surprisingly,
Bayesian model averaging's error rates are consistently higher than
the other methods'. Further investigation shows this to be due to a
marked tendency to overfit on the part of Bayesian model averaging,
contradicting previous beliefs that it solves (or avoids) the
overfitting problem.

# A Unified Bias-Variance Decomposition and its Applications

Pedro Domingos

Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195

### Abstract

This paper presents a unified bias-variance decomposition that is
applicable to squared loss, zero-one loss, variable misclassification
costs, and other loss functions. The unified decomposition sheds light
on a number of significant issues: the relation between some of the
previously proposed decompositions for zero-one loss and the original
one for squared loss, the relation between bias, variance and Schapire
et al.'s (1997) notion of margin, and the nature of the trade-off
between bias and variance in classification. While the bias-variance
behavior of zero-one loss and variable misclassification costs is
quite different from that of squared loss, this difference derives
directly from the different definitions of loss. We have applied the
proposed decomposition to decision tree learning, instance based
learning and boosting on a large suite of benchmark datasets, and made
several significant observations.

Charles Rosenberg
Last modified: Fri Nov 10 15:04:43 EST 2000