In this paper we consider algorithms for active learning which select
data in an attempt to minimize the value of
Equation 4, integrated over **X**. Intuitively, the
minimization proceeds as follows: we assume that we have an estimate
of , the variance of the learner at **x**. If, for
some new input , we knew the conditional distribution
, we could compute an estimate of the
learner's new variance at **x** given an additional example at
. While the true distribution is
unknown, many learning architectures let us approximate it by giving
us estimates of its mean and variance. Using the estimated
distribution of , we can estimate
, the expected variance of
the learner after querying at .

Given the estimate of , which
applies to a given **x** and a given query , we must
integrate **x** over the input distribution to compute the integrated
average variance of the learner. In practice, we will compute a Monte
Carlo approximation of this integral, evaluating
at a number of *
reference points* drawn according to . By querying an
that minimizes the average expected variance over the
reference points, we have a solid statistical basis for choosing new
examples.

Mon Mar 25 09:20:31 EST 1996