Here we borrow a technique from robust Bayesian Statistics, called Lavine's algorithm [Lavine1991, Wasserman1992b]. We now adapt Lavine's technique to Bayesian networks with finitely generated credal sets.

In the most general case, we seek a posterior bound for the expected value of a function u(x), conditional on evidence e. For now we place no restriction to u(x); later we specialize it to the case of posterior probabilities. The quantity we seek is the upper bound for the expected value of u(x):

~~E~~[u] = _{x is in e} u(x) prod_{i} p^{e}_{i} }/
{ sum_{x is in e} prod_{i} p^{e}_{i} },

Instead of attacking the maximization of E[u] directly, Lavine's
algorithm settles for deciding whether or not ~~E~~[u] is
larger than a given value k. When we obtain this result, we can construct
a bracketing algorithm for ~~E~~[u]. The bracketing algorithm
is the following. Pick a real number k and check whether
~~E~~[u] is larger, smaller or equal to k, and
respectively increase k, decrease k or stop. Repeat this until
the solution was found or the bracketing interval is small enough. This
algorithm is convergent and improves monotonically.

To decide whether or not ~~E~~[u] is larger than k, some
transformations are necessary.
First, notice that ~~E~~[u] > k if and only if:

_{x is in e}
(u(x) - k) prod_{i} p^{e}_{i} }/
{ sum_{x is in e} prod_{i} p^{e}_{i} }
> 0.

~~E~~_{k}[u] = _{x is in e}
(u(x) - k) prod_{i} p^{e}_{i}.

To obtain the posterior probability for an event {x_{q} = a} (x_{q} is
the queried variable), we take:

- u(x) = delta
_{a}(x_{q}), where delta_{a}(x_{q}) is 1 if x_{q}= a and 0 otherwise. We have E[delta_{a}] = p(x_{q}= a | e), so Lavine's algorithm generates~~p~~(x_{q}= a | e). - u(x) = - delta
_{a}(x_{q}). We have E[-delta_{a}] = - p(x_{q}= a | e), so Lavine's algorithm generates -__p__(x_{q}= a | e).

To obtain the bounds on the posterior distribution of x_{q},
we have must Lavine's algorithm twice for each value of x_{q}.
For each value of x_{q}, we must rerun
Lavine's bracketing algorithm for delta_{a}(x_{q}) and -delta_{a}(x_{q}).

Lavine's algorithm is reduced to iterations which calculate:

_{x is in e}
(delta_{a}(x_{q}) - k)
prod_{i} p^{e}_{i}.

This section presents an algorithm for the calculation of expression
(7).
Start from a network where some
variables are associated with credal sets and perform a basic
transformation.
For each variable z_{i}, suppose its credal set has vertices p_{i,j}.
We describe a single iteration of Lavine's algorithm, for
which the values of a and k have been fixed.
Suppose x_{q} is associated with a single distribution p_{q}.
Then multiply the distribution p_{q} by (delta_{a}(x_{q}) - k).
Suppose x_{q} is associated with a credal set with vertices p_{q,j}.
Then multiply the vertices p_{q,j} by (delta_{a}(x_{q}) - k).

Now run a MAP algorithm as defined by
expression (2), to determine
the values of the decision variables z'_{i}. For the transformed network,
the MAP solution will give an expression
which is identical to expression (7).
We now have all the elements to apply Lavine's algorithm.

Repeat the following for upper and lower bounds.

Run MAP in the enlarged network; if the result is zero, stop. If not, bracket the interval of k values or stop if this interval is small enough.

If we can solve the MAP problem in expression (8) efficiently, Lavine's algorithm becomes an efficient and provably convergent way of generating robust inferences. In general, solution of the MAP problem involves construction of a cluster of variables with all artificial variables, because in the worst case we cannot interchange any of the maximizations and summations in expression (8). This fact apparently limits the appeal of Lavine's bracketing algorithm, since it seems that in the worst case every one of its iterations may be as expensive as applying the exact algorithms in Section 6. However, there are three general situations where Lavine's algorithm can provide substantial savings:

- The network is partitioned by evidence.
- The transformed network can be cast as an influence diagram.
- The credal sets can be expressed as a set of linear inequalities.

First, suppose the evidence in a network is such that the joint
distribution p(x) can be written as the product of two
blocks, p(x) = p_{1}(x_{1}) p_{2}(x_{2}).
In this case the artificial variables in one of the blocks can be maximized
independently of the variables in the other block.
Note that, even though it is always true that such decompositions will
reduce the complexity of usual Bayesian inference, the exact
algorithms in Section 6 cannot benefit from it,
because the denominator in Bayes rule mixes all distributions together.

Second, suppose the transformed network can be cast as an influence diagram;
this requires a specific topology for the network.
We must be able to divide the variables into m groups I_{k}, each
group with the artificial variables Z_{k} associated with it, such that

p(I_{k}|I_{0} ...I_{k-1}, D_{1} ...D_{m}) =
p(I_{k}|I_{0} ...I_{k-1}, D_{1} ...D_{k});

Finally, suppose the joint distribution can be written as:

p(x) = {(_{i>s} p(x_{i} | _{i})) _{1}, ..., x_{s}),

Tue Jan 21 15:59:56 EST 1997