A typical focus of systems based on Bayesian networks is the posterior probability of various outcomes of individual variables given evidence, Pr(a|). This can be generalized to the computation of the posterior probability of a particular instantiation of a set of variables given evidence, i.e., Pr( = |). There are two methods that are capable of performing this computation. The first method is very efficient at the expense of precision. The second method is less efficient, but offers in general better convergence rates. Both methods are based on Equation 7.
The first method reuses the samples generated to estimate Pr() in estimating Pr(,). Estimation of Pr(,) amounts to counting the scored sum under the condition = . The main advantage of this method is its efficiency -- we can use the same set of samples to estimate the posterior probability of any state of a subset of the network given evidence. Its main disadvantage is that the variance of the estimated Pr(,) can be large, especially when the numerical value of Pr(|) is extreme. This method is the most widely used approach in the existing stochastic sampling algorithms.
The second method, used much more rarely (e.g., [Cano et al.1996,Pradhan and Dagum1996,Dagum and Luby1997]), calls for estimating Pr() and Pr(,) separately. After estimating Pr(), an additional call to the algorithm is made for each instantiation of the set of variables of interest . Pr(,) is estimated by sampling the network with the set of observations extended by = . The main advantage of this method is that it is much better at reducing variance than the first method. Its main disadvantage is the computational cost associated with sampling for possibly many combinations of states of nodes of interest.
Cano et al.  suggested a modified version of the second method. Suppose that we are interested in the posterior distribution Pr(|) for all possible values of , i = 1, 2, ..., k. We can estimate Pr(,) for each i = 1, ..., k separately, and use the value Pr(,) as an estimate for Pr(). The assumption behind this approach is that the estimate of Pr() will be very accurate because of the large sample from which it is drawn. However, even if we can guarantee small variance in every Pr(,), we cannot guarantee that their sum will also have a small variance. So, in the AIS-BN algorithm we only use the pure form of each of the methods. The algorithm listed in Figure 2 is based on the first method when the optional computations in Steps 12 and 13 are performed. An algorithm corresponding to the second method skips the optional steps and calls the basic AIS-BN algorithm twice to estimate Pr() and Pr(,) separately.
The first method is very attractive because of its simplicity and possible computational efficiency. However, as we have shown in Section 2.2, the performance of a sampling algorithm that uses just one set of samples (as in the first method above) to estimate Pr(|) will deteriorate if the difference between the optimal importance functions for Pr(,) and Pr() is large. If the main focus of the computation is high accuracy of the posterior probability distribution of a small number of nodes, we strongly recommend to use the algorithm based on the second method. Also, this algorithm can be easily used to estimate confidence intervals of the solution.