Footnotes |

**(1)**- If there is more than one minimal set, any minimal set can be chosen to be the parents. There is more than one minimal set only when some of the predecessors are deterministic functions of others.
**(2)**- Some people like to say the nodes
are
*labelled*with random variables. In the definition of a graph, the set of nodes can be any set, in particular, they can be a set of random variables. The set of arcs is a set of ordered pairs of random variables. **(3)**- In this and subsequent examples, we assume that variables
are Boolean (i.e., with domain
*{true,false}*). If*X*is a variable,*X=true*is written as*x*and*X=false*is written as, and similarly for other variables. The theory and the implementations are not restricted to binary variables.`~`

x **(4)**- [2] give essentially the
same algorithm, but for the optimization problem of finding a minimization
of sums. In VE, we use the algorithm for finding the sum of
products. VE is named because of the links to the algorithm of
[2]; they refer to their basic algorithm as
*The elimination of variables one by one*, which is exactly what we do. [2] also describe good elimination ordering heuristics and refinements such as eliminating variables in blocks and forms of conditioning which we don't consider here.The only difference between VE and BEBA is that BEBA requires an a priori elimination ordering (and exploits the prior ordering for efficiency), whereas the VE allows for dynamic selection of which variable to eliminate next.

**(5)**- This may look like a circular definition, but the left side defines the summing tables, whereas on the right side we are summing numbers.
**(6)**- To make this a probabilistic problem, and not a decision problem, consider that the probability is for a third party to determine the probability distribution over the possible decisions. A similar analysis can be carried out to exploit contextual independence for decisions [24]. The decision maker's decisions can't depend on information she doesn't have.
**(7)**- This second part of the invariant may not be so
intuitive, but is important. For example, in Example
*, one may be tempted to reduce
confactor (*) to
(i.e., where the table is a function of no variables) as the contribution of the confactors is the same independent of the value of`<``~`

a&`~`

c&`~`

d,0.5`>`*E*(the table*t*has value 0.5 for each value of_{4}[E]*E*in confactor (*)). The first part of the invariant isn't violated. However, if there were no other confactors containing*E*that are applicable whenis true, after summing out`~`

a &`~`

c&`~`

d*E*, we want the confactor, but before summing out`<``~`

a&`~`

c&`~`

d,1`>`*E*we want the confactorin order to maintain the first part of the invariant. We would like to maintain the property that we only consider confactors containing`<``~`

a&`~`

c&`~`

d,0.5`>`*E*when eliminating*E*. The second part of the invariant allows us to do this without treating this as a special case in our algorithm. **(8)**`http://www.cs.huji.ac.il/labs/compbio/Repository/`**(9)**`Note that all of these results are statistically significant to some degree. The least significant is with`*10*observations, that, using the matched-sample t-test (also known as the paired t-test), is significant to the 0.2 level for the logarithm of the runtimes. The others are significant way below the 0.001 level. The logarithm is appropriate as the difference in the logarithms corresponds to the multiplicative speedup.**(10)**`Note that all of these results are statistically significant. The least significant is the`*s=10*plot that, using the matched-sample t-test (also known as the paired t-test), is significant to the 0.05 level for the logarithm of the runtimes. The logarithm is appropriate as the difference in the logarithms corresponds to the multiplicative speedup.**(11)**`This does not mean that all conditioning algorithms need suffer from this. We conjecture that there is a conditioning algorithm that can extend contextual VE but save space, as in other bucket elimination algorithms [11] or the relevant cutsets for probabilistic inference [9][12].`**(12)**`http://www.cs.huji.ac.il/labs/compbio/Repository/`

David Poole and Nevin Lianwen Zhang,Exploiting Contextual Independence In Probabilistic Inference, Journal of Artificial Intelligence Research, 18, 2003, 263-313.

Footnotes |