The gradient of L(Theta) is obtained by computing, for each thetaij:
pdL(Theta)/pdthetaij =
d
dL(Theta)/dthetaij = p(z'i = j|xq = a, e)/thetaij - p(z'i = j|e)/thetaij,
which can be obtained through standard Bayesian network algorithms using local computations. A conjugate gradient descent can be constructed by selecting an initial value for Theta and, at each step, normalizing the values of Theta to ensure they represent proper distributions [31].
© Fabio Cozman[Send Mail?] Fri May 30 15:55:18 EDT 1997