Here we borrow a technique from robust Bayesian Statistics, called Lavine's algorithm [Lavine1991, Wasserman1992b]. We now adapt Lavine's technique to Bayesian networks with finitely generated credal sets.
In the most general case, we seek a posterior bound for the expected value of a function u(x), conditional on evidence e. For now we place no restriction to u(x); later we specialize it to the case of posterior probabilities. The quantity we seek is the upper bound for the expected value of u(x):
E[u] =
Instead of attacking the maximization of E[u] directly, Lavine's
algorithm settles for deciding whether or not E[u] is
larger than a given value k. When we obtain this result, we can construct
a bracketing algorithm for E[u]. The bracketing algorithm
is the following. Pick a real number k and check whether
E[u] is larger, smaller or equal to k, and
respectively increase k, decrease k or stop. Repeat this until
the solution was found or the bracketing interval is small enough. This
algorithm is convergent and improves monotonically.
To decide whether or not E[u] is larger than k, some
transformations are necessary.
First, notice that E[u] > k if and only if:
Ek[u] =
To obtain the posterior probability for an event {xq = a} (xq is the queried variable), we take:
To obtain the bounds on the posterior distribution of xq, we have must Lavine's algorithm twice for each value of xq. For each value of xq, we must rerun Lavine's bracketing algorithm for deltaa(xq) and -deltaa(xq).
Lavine's algorithm is reduced to iterations which calculate:
This section presents an algorithm for the calculation of expression (7). Start from a network where some variables are associated with credal sets and perform a basic transformation. For each variable zi, suppose its credal set has vertices pi,j. We describe a single iteration of Lavine's algorithm, for which the values of a and k have been fixed. Suppose xq is associated with a single distribution pq. Then multiply the distribution pq by (deltaa(xq) - k). Suppose xq is associated with a credal set with vertices pq,j. Then multiply the vertices pq,j by (deltaa(xq) - k).
Now run a MAP algorithm as defined by expression (2), to determine the values of the decision variables z'i. For the transformed network, the MAP solution will give an expression which is identical to expression (7). We now have all the elements to apply Lavine's algorithm.
Repeat the following for upper and lower bounds.
Run MAP in the enlarged network; if the result is zero, stop. If not, bracket the interval of k values or stop if this interval is small enough.
If we can solve the MAP problem in expression (8) efficiently, Lavine's algorithm becomes an efficient and provably convergent way of generating robust inferences. In general, solution of the MAP problem involves construction of a cluster of variables with all artificial variables, because in the worst case we cannot interchange any of the maximizations and summations in expression (8). This fact apparently limits the appeal of Lavine's bracketing algorithm, since it seems that in the worst case every one of its iterations may be as expensive as applying the exact algorithms in Section 6. However, there are three general situations where Lavine's algorithm can provide substantial savings:
First, suppose the evidence in a network is such that the joint distribution p(x) can be written as the product of two blocks, p(x) = p1(x1) p2(x2). In this case the artificial variables in one of the blocks can be maximized independently of the variables in the other block. Note that, even though it is always true that such decompositions will reduce the complexity of usual Bayesian inference, the exact algorithms in Section 6 cannot benefit from it, because the denominator in Bayes rule mixes all distributions together.
Second, suppose the transformed network can be cast as an influence diagram; this requires a specific topology for the network. We must be able to divide the variables into m groups Ik, each group with the artificial variables Zk associated with it, such that
p(Ik|I0 ...Ik-1, D1 ...Dm) = p(Ik|I0 ...Ik-1, D1 ...Dk);
if this condition is true, then efficient algorithms can be found to solve the MAP problem [Jensen & Jensen1994].Finally, suppose the joint distribution can be written as:
p(x) = {(
Tue Jan 21 15:59:56 EST 1997