A Generalization of AIS-BN: The Problem of Estimating Pr(|)

A typical focus of systems based on Bayesian networks is the
posterior probability of various outcomes of individual variables
given evidence,
Pr(*a*|). This can be generalized to the
computation of the posterior probability of a particular
instantiation of a set of variables given evidence, i.e.,
Pr( = |). There are two methods that are
capable of performing this computation. The first method is very
efficient at the expense of precision. The second method is less
efficient, but offers in general better convergence rates. Both
methods are based on Equation 7.

The first method reuses the samples generated to estimate Pr() in estimating Pr(,). Estimation of Pr(,) amounts to counting the scored sum under the condition = . The main advantage of this method is its efficiency -- we can use the same set of samples to estimate the posterior probability of any state of a subset of the network given evidence. Its main disadvantage is that the variance of the estimated Pr(,) can be large, especially when the numerical value of Pr(|) is extreme. This method is the most widely used approach in the existing stochastic sampling algorithms.

The second method, used much more rarely (e.g.,
[Cano *et al.*1996,Pradhan and Dagum1996,Dagum and Luby1997]), calls for estimating
Pr() and
Pr(,) separately. After
estimating
Pr(), an additional call to the algorithm
is made for each instantiation of the set of variables
of interest .
Pr(,) is estimated by
sampling the network with the set of observations
extended by
= . The main advantage of this method
is that it is much better at reducing variance than the first
method. Its main disadvantage is the computational cost associated
with sampling for possibly many combinations of states of nodes of
interest.

Cano et al. [1996] suggested a modified version of the
second method. Suppose that we are interested in the posterior
distribution
Pr(|) for all possible values
of , *i* = 1, 2, ..., *k*. We can estimate
Pr(,) for each *i* = 1, ..., *k* separately,
and use the value
Pr(,) as an
estimate for
Pr(). The assumption behind this approach
is that the estimate of
Pr() will be very accurate
because of the large sample from which it is drawn. However, even
if we can guarantee small variance in every
Pr(,), we cannot guarantee that their sum will also have a small
variance. So, in the AIS-BN algorithm we only use the pure form of
each of the methods. The algorithm listed in Figure 2
is based on the first method when the optional computations in
Steps 12 and 13 are performed. An algorithm corresponding to the
second method skips the optional steps and calls the basic AIS-BN
algorithm twice to estimate
Pr() and
Pr(,) separately.

The first method is very attractive because of its simplicity and possible computational efficiency. However, as we have shown in Section 2.2, the performance of a sampling algorithm that uses just one set of samples (as in the first method above) to estimate Pr(|) will deteriorate if the difference between the optimal importance functions for Pr(,) and Pr() is large. If the main focus of the computation is high accuracy of the posterior probability distribution of a small number of nodes, we strongly recommend to use the algorithm based on the second method. Also, this algorithm can be easily used to estimate confidence intervals of the solution.