Next: Sampling-based techniques Up: INTERIOR-POINT ALGORITHMS Previous: Gradient-based techniques

### The QEM algorithm

In this subsection we show how the original Expectation-Maximization algorithm [13] can be extended to a Quasi-Bayesian Expectation-Maximization (QEM) algorithm with the same convergence properties. We must maximize the posterior log-likelihood L(Theta) defined previously. The algorithm begins by assuming that the transparent variables are actual random quantities with distributions specified by Theta. An initial estimate Theta0 is assumed for Theta.

Suppose we had i sets of complete data for the transformed network, i.e., we had observed i trials for all variables in the network, including the transparent variables. The log-likelihood for this complete data would be L(Theta) = sumijk li(j,k) logthetaijk, where li(j,k) indicates the number of data points when the variable xi is instantiated in its j value with its parents instantiated in their k value.

The first step of the QEM algorithm is to obtain the expected value of the log-likelihood given the evidence and assuming Theta0 is correct [10]:

Q(Theta|Thetak) = E{[ log{( p(xq = a, e) } - log{( p(e) } } = sumijk p(xi, pa(xi) | xq = a, e) logthetaijk - sumijk p(xi, pa(xi) | e) logthetaijk.

The second step of the QEM algorithm is to maximize Q(Theta|Thetak) for Theta. Only a few terms in the expression for Q(Theta|Thetak) will be free, since only the thetaij for z'i are estimated. Collecting these terms we obtain:

sumij p(z'i = j | xq = a, e) logthetaij - sumij p(z'i = j| e) logthetaij,

To perform maximization, use gradient descent with Thetak as a starting point and ensure that at the end of the process we have Q(Thetak+1|Thetak) > Q(Thetak|Thetak). The gradient has essentially the same expression used in the previous subsection, which can be obtained through standard Bayesian network algorithms. Now set Thetak+1 to the maximizing value and go to the next iteration. The following theorem provides the justification for the QEM algorithm (proof can be found in [10]):

Next: Sampling-based techniques Up: INTERIOR-POINT ALGORITHMS Previous: Gradient-based techniques