Mixtures of Gaussians and locally weighted regression are two statistical models that offer elegant representations and efficient learning algorithms. In this paper we have shown that they also offer the opportunity to perform active learning in an efficient and statistically correct manner. The criteria derived here can be computed cheaply and, for problems tested, demonstrate good predictive power. In industrial settings, where gathering a single data point may take days and cost thousands of dollars, the techniques described here have the potential for enormous savings.
In this paper, we have only considered function approximation problems. Problems requiring classification could be handled analogously with the appropriate models. For learning classification with a mixture model, one would select examples so as to maximize discriminability between Gaussians; for locally weighted regression, one would use a logistic regression instead of the linear one considered here [Weisberg 1985].
Our future work will proceed in several directions. The most important is active bias minimization. As noted in Section 2, the learner's error is composed of both bias and variance. The variance-minimizing strategy examined here ignores the bias component, which can lead to significant errors when the learner's bias is non-negligible. Work in progress examines effective ways of measuring and optimally eliminating bias [Cohn 1995]; future work will examine how to jointly minimize both bias and variance to produce a criterion that truly minimizes the learner's expected error.
Another direction for future research is the derivation of variance- (and bias-) minimizing techniques for other statistical learning models. Of particular interest is the class of models known as ``belief networks'' or ``Bayesian networks'' [Pearl 1988,Heckerman et al. 1994]. These models have the advantage of allowing inclusion of domain knowledge and prior constraints while still adhering to a statistically sound framework. Current research in belief networks focuses on algorithms for efficient inference and learning; it would be an important step to derive the proper criteria for learning actively with these models.