Exact inference algorithms perform many millions of arithmetic operations when applied to complex graphical models such as the QMR-DT. While this proliferation of terms expresses the symbolic structure of the model, it does not necessarily express the numeric structure of the model. In particular, many of the sums in the QMR-DT inference problem are sums over large numbers of random variables. Laws of large numbers suggest that these sums may yield predictable numerical results over the ensemble of their summands, and this fact might enable us to avoid performing the sums explicitly.

To exploit the possibility of numerical regularity in dense graphical models we develop a variational approach to approximate probabilistic inference. Variational methods are a general class of approximation techniques with wide application throughout applied mathematics. Variational methods are particularly useful when applied to highly-coupled systems. By introducing additional parameters, known as ``variational parameters''--which essentially serve as low-dimensional surrogates for the high-dimensional couplings of the system--these methods achieve a decoupling of the system. The mathematical machinery of the variational approach provides algorithms for finding values of the variational parameters such that the decoupled system is a good approximation to the original coupled system.

In the case of probabilistic graphical models variational methods allow us to simplify a complicated joint distribution such as the one in Eq. (7). This is achieved via parameterized transformations of the individual node probabilities. As we will see later, these node transformations can be interpreted graphically as delinking the nodes from the graph.

How do we find appropriate transformations? The variational methods that we consider here come from convex analysis (see Appendix 6). Let us begin by considering methods for obtaining upper bounds on probabilities. A well-known fact from convex analysis is that any concave function can be represented as the solution to a minimization problem:

where is the *conjugate function* of *f*(*x*). The function
is itself obtained as the solution to a minimization problem:

The formal identity of this pair of minimization problems expresses the
``duality'' of *f* and its conjugate .

The representation of *f* in Eq. (8) is known as a *variational
transformation*. The parameter is known as a *variational parameter*.
If we relax the minimization and fix the the variational parameter to an
arbitrary value, we obtain an upper bound:

The bound is better for some values of the variational parameter than for others, and for a particular value of the bound is exact.

We also want to obtain lower bounds on conditional probabilities.
A straightforward way to obtain lower bounds is to again appeal
to conjugate duality and to express functions in terms of a
maximization principle. This representation, however, applies
to *convex* functions--in the current paper we require lower bounds
for *concave* functions. Our concave functions, however, have a
special form that allows us to exploit conjugate duality in a different
way. In particular, we require bounds for functions of the form
, where *f* is a concave function, where for
are non-negative variables, and where *a* is a
constant. The variables in this expression are effectively
coupled--the impact of changing one variable is contingent on the settings
of the remaining variables. We can use Jensen's inequality, however, to
obtain a lower bound in which the variables are decoupled.^{3}
In particular:

where the can be viewed as defining a probability distribution over the
variables . The variational parameter in this case is the probability
distribution *q*. The optimal setting of this parameter is given by
. This is easily verified by substitution into
Eq. (12), and demonstrates that the lower bound is tight.

Sun May 9 16:22:01 PDT 1999