The variational method that we have described involves replacing selected local conditional probabilities with either upper-bounding or lower-bounding variational transformations. Because the product of bounds is a bound, the variationally transformed joint probability distribution is a bound (upper or lower) on the true joint probability distribution. Moreover, because sums of bounds is a bound on the sum, we can obtain bounds on marginal probabilities by marginalizing the variationally transformed joint probability distribution. In particular, this provides a method for obtaining bounds on the likelihood (the marginal probability of the evidence).
Note that the variationally transformed distributions are bounds for arbitrary values of the variational parameters (because each individually transformed node conditional probability is a bound for arbitrary values of its variational parameter). To obtain optimizing values of the variational parameters, we take advantage of the fact that our transformed distribution is a bound, and either minimize (in the case of upper bounds) or maximize (in the case of lower bounds) the transformed distribution with respect to the variational parameters. It is this optimization process which provides a tight bound on the marginal probability of interest (e.g., the likelihood) and thereby picks out a particular variational distribution that can subsequently be used for approximate inference.
In this appendix we discuss the optimization problems that we must solve in the case of noisy-OR networks. We consider the upper and lower bounds separately, beginning with the upper bound.