Next: Appendix A. Duality Up: Variational Probabilistic Inference and Previous: Interval Bounds on the

# Discussion

Let us summarize the variational inference method and evaluate the results that we have obtained.

The variational method begins with parameterized upper and lower bounds on the individual conditional probabilities at the nodes of the model. For the QMR-DT, these bounds are exponentials of linear functions, and introducing them into the model corresponds to delinking nodes from the graph. Sums of products of these bounds yield bounds, and thus we readily obtain parameterized bounds on marginal probabilities, in particular upper and lower bounds on the likelihood.

We exploited the likelihood bounds in evaluating the output of the likelihood-weighted sampling algorithm. Although the sampling algorithm did not yield reliable results across the corpus of CPC cases, when we utilized the variational upper and lower bounds to select among the samples we were able to obtain sampling results that were consistent between runs. This suggests a general procedure in which variational bounds are used to assess the convergence of a sampling algorithm. (One can also imagine a more intimate relationship between these algorithms in which the variational bounds are used to adjust the on-line course of the sampler).

The fact that we have bounds on the likelihood (or other marginal probabilities) is critical--the bounding property allows us to find optimizing values of the variational parameters by minimizing the upper-bounding variational distribution and maximizing the lower-bounding variational distribution. In the case of the QMR-DT network (a bipartite noisy-OR graph), the minimization problem is a convex optimization problem and the maximization problem is solved via the EM algorithm.

Once the variational parameters are optimized, the resulting variational distribution can be exploited as an inference engine for calculating approximations to posterior probabilities. This technique has been our focus in the paper. Graphically, the variationally transformed model can be viewed as a sub-graph of the original model in which some of the finding nodes have been delinked. If a sufficient number of findings are delinked variationally then it is possible to run an exact algorithm on the resulting graph. This approach yields approximations to the posterior marginals of the disease nodes.

We found empirically that these approximations appeared to provide good approximations to the true posterior marginals. This was the case for the tractable set of CPC cases (cf. Figure 7) and--subject to our assumption that we have obtained a good surrogate for the gold standard via the selected output of the sampler--also the case for the full CPC corpus (cf. Figure 11).

We also compared the variational algorithm to a state-of-the-art algorithm for the QMR-DT, the likelihood-weighted sampler of Shwe and Cooper (1991). We found that the variational algorithm outperformed the likelihood-weighted sampler both for the tractable cases and for the full corpus. In particular, for a fixed accuracy requirement the variational algorithm was significantly faster (cf. Figure 5), and for a fixed time allotment the variational algorithm was significantly more accurate (cf. Figure 8 and Figure 11).

Our results were less satisfactory for the interval bounds on the posterior marginals. Across the full CPC corpus we found that for approximately one third of the disease the bounds were tight but for half of the diseases the bounds were vacuous. A major impediment to obtaining tighter bounds appears to lie not in the variational approximation per se but rather in the exact subroutine, and we are investigating exact methods with improved numerical properties.

Although we have focused in detail on the QMR-DT model in this paper, it is worth noting that the variational probabilistic inference methodology is considerably more general. Specifically, the methods that we have described here are not limited to the bi-partite graphical structure of the QMR-DT model, nor is it necessary to employ noisy-OR nodes (Jaakkola & Jordan, 1996). It is also the case that the type of transformations that we have exploited in the QMR-DT setting extend to a larger class of dependence relations based on generalized linear models (Jaakkola, 1997). Finally, for a review of applications of variational methods to a variety of other graphical model architectures, see Jordan, et al. (1998).

A promising direction for future research appears to be in the integration of various kinds of approximate and exact methods (see, e.g., Dagum & Horvitz, 1992; Jensen, Kong, & Kjærulff, 1995). In particular, search-based methods (Cooper, 1985; Peng & Reggia, 1987, Henrion, 1991) and variational methods both yield bounds on probabilities, and, as we have indicated in the introduction, they seem to exploit different aspects of the structure of complex probability distributions. It may be possible to combine the bounds from these algorithm--the variational bounds might be used to guide the search, or the search-based bounds might be used to aid the variational approximation. Similar comments can be made with respect to localized partial evaluation methods and bounded conditioning methods (Draper & Hanks, 1994; Horvitz, et al., 1989). Also, we have seen that variational bounds can be used for assessing whether estimates from Monte Carlo sampling algorithms have converged. A further interesting hybrid would be a scheme in which variational approximations are refined by treating them as initial conditions for a sampler.

Even without extensions our results in this paper appear quite promising. We have presented an algorithm which runs in real time on a large-scale graphical model for which exact algorithms are in general infeasible. The results that we have obtained appear to be reasonably accurate across a corpus of difficult diagnostic cases. While further work is needed, we believe that our results indicate a promising role for variational inference in developing, critiquing and exploiting large-scale probabilistic models such as the QMR-DT.

# Acknowledgements

We would like to thank the University of Pittsburgh and Randy Miller for the use of the QMR-DT database. We also want to thank David Heckerman for suggesting that we attack QMR-DT with variational methods, and for providing helpful counsel along the way.

Next: Appendix A. Duality Up: Variational Probabilistic Inference and Previous: Interval Bounds on the

Michael Jordan
Sun May 9 16:22:01 PDT 1999