Practical Performance of the Existing Sampling Algorithms

The largest network that has been tested using sampling algorithms
is QMR-DT (Quick Medical Reference -- Decision Theoretic)
[Shwe *et al.*1991,Shwe and Cooper1991], which contains 534 adult diseases and
4,040 findings, with 40,740 arcs depicting disease-to-finding
dependencies. The QMR-DT network belongs to a class of special
bipartite networks and its structure is often referred to as BN2O
[Henrion1991], because of its two-layer composition:
disease nodes in the top layer and finding nodes in the bottom
layer. Shwe and colleagues used an algorithm combining
self-importance and heuristic importance and tested its
convergence properties on the QMR-DT network. But since the
heuristic method *iterative tabular Bayes* (ITB) that makes
use of a version of Bayes' rule is designed for the BN2O networks,
it cannot be generalized to arbitrary networks. Although Shwe and
colleagues concluded that Markov blanket scoring and
self-importance sampling significantly improve the convergence
rate in their model, we cannot extend this conclusion to general
networks. The computation of Markov blanket scoring is more
complex in a general multi-connected network than in a BN2O
network. Also, the experiments conducted lacked a gold-standard
posterior probability distribution that could serve to judge the
convergence rate.

Pradhan and Dagum [1996] tested an efficient version
of the LW algorithm -- bounded variance algorithm
[Dagum and Luby1997] and the AA algorithm [Dagum *et al.*1995] on a 146
node, multiply connected medical diagnostic Bayesian network. One
limitation in their tests is that the probability of evidence in
the cases selected for testing was rather high. Although over 10%
of the cases had the probability of evidence on the order of
10^{-8} or smaller, a simple calculation based on the reported
mean = 34.5 number of evidence nodes, shows that the average
probability of an observed state of an evidence node conditional
on its direct predecessors was on the order of
(10^{-8})^{1/34.5} 0.59. Given that their algorithm is
essentially based on the LW algorithm, based on our tests we
suspect that the performance will deteriorate on cases where the
evidence is very unlikely. Both algorithms focus on the marginal
probability of one hypothesis node. If there are many queried
nodes, the efficiency may deteriorate.

We have tested the algorithms discussed in Section 2.3 on several large networks. Our experimental results show that in cases with very unlikely evidence, none of these algorithms converges to reasonable estimates of the posterior probabilities within a reasonable amount of time. The convergence becomes worse as the number of evidence nodes increases. Thus, when using these algorithms in very large networks, we simply cannot trust the results. We will present results of tests of the LW and SIS algorithms in more detail in Section 4.