Next: Classes of finitely generated Up: Robustness Analysis of Bayesian Previous: Approximate inferences through parameter

Approximate inferences through Lavine's bracketing algorithm

Here we borrow a technique from robust Bayesian Statistics, called Lavine's algorithm [Lavine1991, Wasserman1992b]. We now adapt Lavine's technique to Bayesian networks with finitely generated credal sets.

The bracketing algorithm

In the most general case, we seek a posterior bound for the expected value of a function u(x), conditional on evidence e. For now we place no restriction to u(x); later we specialize it to the case of posterior probabilities. The quantity we seek is the upper bound for the expected value of u(x):

E[u] = max { sum_{x is in e} u(x) prod_i p^e_i }/ { sum_{x is in e} prod_i p^e_i },

where p^e_i indicates that the evidence has been fixed in the distribution p_i. The maximization is with respect to all the distributions in the convex hull of the credal sets p_i.

Instead of attacking the maximization of E[u] directly, Lavine's algorithm settles for deciding whether or not E[u] is larger than a given value k. When we obtain this result, we can construct a bracketing algorithm for E[u]. The bracketing algorithm is the following. Pick a real number k and check whether E[u] is larger, smaller or equal to k, and respectively increase k, decrease k or stop. Repeat this until the solution was found or the bracketing interval is small enough. This algorithm is convergent and improves monotonically.

To decide whether or not E[u] is larger than k, some transformations are necessary. First, notice that E[u] > k if and only if:

max { sum_{x is in e} (u(x) - k) prod_i p^e_i }/ { sum_{x is in e} prod_i p^e_i } > 0.

Second, notice that if this expression is larger than zero, then the maximum value of the numerator must be larger than zero; conversely, if the maximum value of the numerator is larger than zero, the expression is larger than zero. Hence Lavine's algorithm depends on the solution of a maximization problem for a given k:

E_k[u] = max sum_{x is in e} (u(x) - k) prod_i p^e_i.

Probability bounds

To obtain the posterior probability for an event {x_q = a} (x_q is the queried variable), we take:

u(x) = delta_a(x_q), where delta_a(x_q) is 1 if x_q = a and 0 otherwise. We have E[delta_a] = p(x_q = a | e), so Lavine's algorithm generates p(x_q = a | e).
u(x) = - delta_a(x_q). We have E[-delta_a] = - p(x_q = a | e), so Lavine's algorithm generates - p(x_q = a | e).

To obtain the bounds on the posterior distribution of x_q, we have must Lavine's algorithm twice for each value of x_q. For each value of x_q, we must rerun Lavine's bracketing algorithm for delta_a(x_q) and -delta_a(x_q).

Lavine's algorithm is reduced to iterations which calculate:

max sum_{x is in e} (delta_a(x_q) - k) prod_i p^e_i.

The maximization is with respect to all distributions in the joint credal set, but only the vertices of the joint credal set must be examined.

Efficient iterations for Lavine's algorithm

This section presents an algorithm for the calculation of expression (7). Start from a network where some variables are associated with credal sets and perform a basic transformation. For each variable z_i, suppose its credal set has vertices p_i,j. We describe a single iteration of Lavine's algorithm, for which the values of a and k have been fixed. Suppose x_q is associated with a single distribution p_q. Then multiply the distribution p_q by (delta_a(x_q) - k). Suppose x_q is associated with a credal set with vertices p_q,j. Then multiply the vertices p_q,j by (delta_a(x_q) - k).

Now run a MAP algorithm as defined by expression (2), to determine the values of the decision variables z'_i. For the transformed network, the MAP solution will give an expression which is identical to expression (7). We now have all the elements to apply Lavine's algorithm.

Summary of the algorithm

Repeat the following for upper and lower bounds.

AlgorithmTransform the original network as explained in the previous section. For each value of the queried variable x_q, initialize k. To bracket k, iterate the following step.

Run MAP in the enlarged network; if the result is zero, stop. If not, bracket the interval of k values or stop if this interval is small enough.

When to use Lavine's bracketing algorithm

If we can solve the MAP problem in expression (8) efficiently, Lavine's algorithm becomes an efficient and provably convergent way of generating robust inferences. In general, solution of the MAP problem involves construction of a cluster of variables with all artificial variables, because in the worst case we cannot interchange any of the maximizations and summations in expression (8). This fact apparently limits the appeal of Lavine's bracketing algorithm, since it seems that in the worst case every one of its iterations may be as expensive as applying the exact algorithms in Section 6. However, there are three general situations where Lavine's algorithm can provide substantial savings:

The network is partitioned by evidence.
The transformed network can be cast as an influence diagram.
The credal sets can be expressed as a set of linear inequalities.

First, suppose the evidence in a network is such that the joint distribution p(x) can be written as the product of two blocks, p(x) = p₁(x₁) p₂(x₂). In this case the artificial variables in one of the blocks can be maximized independently of the variables in the other block. Note that, even though it is always true that such decompositions will reduce the complexity of usual Bayesian inference, the exact algorithms in Section 6 cannot benefit from it, because the denominator in Bayes rule mixes all distributions together.

Second, suppose the transformed network can be cast as an influence diagram; this requires a specific topology for the network. We must be able to divide the variables into m groups I_k, each group with the artificial variables Z_k associated with it, such that

p(I_k|I₀ ...I_k-1, D₁ ...D_m) = p(I_k|I₀ ...I_k-1, D₁ ...D_k);

if this condition is true, then efficient algorithms can be found to solve the MAP problem [Jensen & Jensen1994].

Finally, suppose the joint distribution can be written as:

p(x) = {( prod_i>s p(x_i | pa(x_i)) } p(x₁, ..., x_s),

where p(x₁, ..., x_s) is a convex set of multivariate distributions defined by a set of linear inequalities (the distribution p(x₁, ..., x_s) can even be a conditional distribution). In this case we can drop the artificial variables and consider the direct maximization of expression (7) for each one of the iterations in Lavine's algorithm. Expression (8) then becomes a linear programming problem which can be solved by standard algorithms. The advantadge of this approach is that the number of vertices of the credal set may be much larger than the set of its defining inequalities, for example if the credal set is a density bounded class (analyzed in Section 10.5) or a density ratio class (analyzed in Section 10.7). Again, the linear structure of this problem can only be explored using Lavine's algorithm, as the direct application of Bayes rule produces a non-linear problem.

Next: Classes of finitely generated Up: Robustness Analysis of Bayesian Previous: Approximate inferences through parameter

Tue Jan 21 15:59:56 EST 1997