In this section we recast the robust inference problem as a parameter
estimation problem.
Consider a Quasi-Bayesian network and the transformed Bayesian network
with artificial variables { z'_{i} }. Each artificial variable
has values { 1, 2, ..., |z'_{i}| }. Assume z'_{i} is a random
variable with distribution theta_{ij} = p(z'_{i} = j).
Call Theta the vector of all theta_{ij}.

Suppose x_{q} is a queried variable. The objective is to find:

~~p~~(x_{q} = a | e) =
_{Theta}
{ sum_{x is in {e, xq = a}}
prod_{i} p_{i}^{{ e, xq = a}}}/
{ sum_{x is in e} prod_{i} p_{i}^{e} }.

Optimization through gradient descent techniques,
such as the conjugate gradient algorithm, have been used for learning in
Bayesian networks. These algorithms benefit from the
fact that the necessary gradient calculations
can be computed with standard Bayesian network algorithms [Russell *et al.
*1995].

To solve the robust inference problem, we must maximize the posterior log-likelihood for Theta (minimization is accomplished by maximizing the negative log-likelihood):

L(Theta) = _{q} = a | e) = _{q} = a, e) }/{ p(e) } ,

L(Theta) = _{q} = a, e) -

The gradient of L(Theta) is obtained by computing, for each theta_{ij}:

{ pdL(Theta) }/{ pdtheta_{ij} } =
{ pd_{q} = a, e) }/{ pdtheta_{ij} } -
{ pd_{ij} } .

{ pd_{ij} } =
{ pdsum_{j} p(e'|z'_{i} = j) p(z'_{i} = j) }/{ pdtheta_{ij} }
{ 1 }/{ p(e') } .

{ pd_{ij} } =
{ p(e'|z'_{i} = j) }/{ p(e') } .

{ pd_{ij} } =
{ p(z'_{i} = j|e') p(e') }/{ p(z'_{i} = j) }
{ 1 }/{ p(e') } = { p(z'_{i} = j|e') }/{ p(z'_{i} = j) } .

{ pdL(Theta) }/{ pdtheta_{ij} } =
{ p(z'_{i} = j|x_{q} = a, e) }/{ theta_{ij} } -
{ p(z'_{i} = j|e) }/{ theta_{ij} } ,

The EM algorithm produces a maximum likelihood estimate by maximizing the log-likelihood expression [Dempster, Laird, & Rubin1977]. To solve the robust inference problem, we must maximize the posterior log-likelihood L(Theta). We show how the original EM algorithm can be extended to a Quasi-Bayesian Expectation Maximization (QEM) algorithm and prove convergence properties.

The algorithm begins by assuming that the artificial variables
are actual random quantities with distributions specified by Theta,
and that we could even observe those variables as evidence.
An initial estimate Theta^{0} is assumed for Theta.

Suppose we had l sets of complete data for the transformed network, i.e., we had observed l trials for all variables in the network, including the artificial variables. The log-likelihood for this complete data would be:

L(Theta) = sum_{ijk} l_{i}(j,k) _{ijk},

The first step of the QEM algorithm is to obtain the expected value of
the log-likelihood given the evidence and assuming Theta^{0} is correct:

Q(Theta|Theta^{k}) = E{[_{q} = a, e)

_{ijk} l_{i}(j,k) _{ij} _{ijk} {(_{l} p(x_{i}, _{i}) | e) _{ijk}.

^{k}) _{ijk} p(x_{i}, _{i}) | x_{q} = a, e) _{ijk} _{ijk} p(x_{i}, _{i}) | e) _{ijk}.

The second step of the QEM algorithm is to
maximize Q(Theta|Theta^{k}) for Theta. Only a few terms in the
expression for Q(Theta|Theta^{k}) will be free, since only the theta_{ij}
for z'_{i} are estimated.
Collecting these terms we obtain:

sum_{ij} p(z'_{i} = j | x_{q} = a, e) _{ij}
- sum_{ij} p(z'_{i} = j| e) _{ij},

{ p(z'_{i} = j|x_{q} = a, e) }/{ theta_{ij} } -
{ p(z'_{i} = j|e) }/{ theta_{ij} } ,

**Proof.**
We have:

H(Theta'|Theta) =
- sum_{1370}x is in { x_{q}, e }
_{q}|e, Theta') }/{ p(x_{q} | e, Theta') } _{q}, e, Theta)

Q(Theta'|Theta) =
sum_{1371}x is in { x_{q}, e }
_{q}|e, Theta') p(x | x_{q}, e, Theta).

L(Theta') = Q(Theta'|Theta) + H(Theta'|Theta)

and (by [Dempster, Laird, & Rubin1977, Lemma 1,]):H(Theta^{k+1}|Theta^{k}) geH(Theta^{k}|Theta^{k}).

^{k+1}) ^{k+1}|Theta^{k}) + H(Theta^{k+1}|Theta^{k}) ^{k}|Theta^{k}) + H(Theta^{k}|Theta^{k}) ^{k}).

Tue Jan 21 15:59:56 EST 1997