The main network used in our tests is a subset of the CPCS (Computerbased Patient Case Study) model [Pradhan et al.1994], a large multiplyconnected multilayer network consisting of 422 multivalued nodes and covering a subset of the domain of internal medicine. Among the 422 nodes, 14 nodes describe diseases, 33 nodes describe history and risk factors, and the remaining 375 nodes describe various findings related to the diseases. The CPCS network is among the largest real networks available to the research community at the present time. The CPCS network contains many extreme probabilities, typically on the order of 10^{4}. Our analysis is based on a subset of 179 nodes of the CPCS network, created by Max Henrion and Malcolm Pradhan. We used this smaller version in order to be able to compute the exact solution for the purpose of measuring approximation error in the sampling algorithms.
The AISBN algorithm has some learning overhead. The following comparison of execution time vs. number of samples may give the reader an idea of this overhead. Updating the CPCS network with 20 evidence nodes on our system takes the AISBN algorithm a total of 8.4 seconds to learn. It generates subsequently 3,640 samples per second, while the SIS algorithm generates 2,631 samples per second, and the LW algorithm generates 4,167 samples per second. In order to remain conservative towards the AISBN algorithm, in all our experiments we fixed the execution time of the algorithms (our limit was 60 seconds) rather than the number of samples. In the CPCS network with 20 evidence nodes, in 60 seconds, AISBN generates about 188,000 samples, SIS generates about 158,000 samples and LW generates about 250,000 samples.
We generated a total of 75 test cases consisting of five sequences of 15 test cases each. We ran each test case 10 times, each time with a different setting of the random number seed. Each sequence had a progressively higher number of evidence nodes: 15, 20, 25, 30, and 35 evidence nodes respectively. The evidence nodes were chosen randomly (equiprobable sampling without replacement) from those nodes that described various plausible medical findings. Almost all of these nodes were leaf nodes in the network. We believe that this constituted very realistic test cases for the algorithms. The distribution of the prior probability of evidence, Pr( = ), across all test runs of our experiments is shown in Figure 4. The least likely evidence was 5.54×10^{42}, the most likely evidence was 1.37×10^{9}, and the median was 7×10^{24}.


Figures 5 and 6 show a typical plot of convergence of the tested sampling algorithms in our experiments. The case illustrated involves updating the CPCS network with 20 evidence nodes. We plot the MSE after the initial 15 seconds during which the algorithms start converging. In particular, the learning step of the AISBN algorithm is usually completed within the first 9 seconds. We ran the three algorithms in this case for 150 seconds rather than the 60 seconds in the actual experiment in order to be able to observe a wider range of convergence. The plot of the MSE for the AISBN algorithm almost touches the X axis in Figure 5. Figure 6 shows the same plot in a finer scale in order to show more detail in the AISBN convergence curve. It is clear that the AISBN algorithm dramatically improves the convergence rate. We can also see that the results of AISBN converge to exact results very fast as the sampling time increases. In the case captured in Figures 5 and 6, a tenfold increase in the sampling time (after subtracting the overhead for the AISBN algorithm, it corresponds to a 21.5fold increase in the number of samples) results in a 4.55fold decrease of the MSE (to MSE 0.00048). The observed convergence of both SIS and LW algorithms was poor. A tenfold increase in sampling time had practically no effect on accuracy. Please note that this is a very typical case observed in our experiments.

Figure 7 illustrates the ICPT learning process of the AISBN algorithm for the sample case shown in Figure 6. The displayed conditional probabilities belong to the node gasAcute which is a parent of two evidence nodes, difInfGasMuc and abdPaiExaMea. The node gasAcute has four states: ``absent,'' ``mild,'' ``moderate,'' and ``severe'', and two parents. We randomly chose a combination of its parents' states as our displayed configuration. The original CPT for this configuration without evidence, the exact ICPT with evidence and the learned ICPT with evidence are summarized numerically in Table 1. Figure 7 illustrates that the learned importance conditional probabilities begin to converge to the exact results stably after three updating steps. The learned probabilities in Step 10 are close to the exact results. In this example, the difference between Pr(x_{i}pa(X_{i}),) and Pr(x_{i}pa(X_{i})) is very large. Sampling from Pr(x_{i}pa(X_{i})) instead of Pr(x_{i}pa(X_{i}),) would introduce large variance into our results.

Figure 8 shows the MSE for all 75 test cases in our experiments with the summary statistics in Table 2. A paired onetailed ttest resulted in statistically highly significant differences between the AISBN and SIS algorithms ( p < 3.1×10^{20}), and also between the SIS and LW algorithms ( p < 1.7×10^{8}). As far as the magnitude of difference is concerned, AISBN was significantly better than SIS. SIS was better than LW, but the difference was small. The mean MSEs of SIS and LW algorithms were both greater than 0.1, which suggests that neither of these algorithms is suitable for large Bayesian networks.
The graph in Figure 9 shows the MSE ratio between the AISBN and SIS algorithms. We can see that the percentage of the cases whose ratio was greater than 100 (two orders of magnitude improvement!) is 60%. In other words, we obtained two orders of magnitude improvement in MSE in more than half of the cases. In 80% cases, the ratio was greater than 50. The smallest ratio in our experiments was 2.67, which happened when posterior probabilities were dominated by the prior probabilities. In that case, even though the LW and SIS algorithms converged very fast, their MSE was still far larger than that of AISBN.
Our next experiment aimed at showing how close the AISBN algorithm can approach the best possible sampling results. If we know the optimal importance sampling function, the convergence of the AISBN algorithm should be the same as that of forward sampling without evidence. In other words, the results of the probabilistic logic sampling algorithm without evidence approach the limit of how well stochastic sampling can perform. We ran the logic sampling algorithm on the CPCS network without evidence mimicking the test runs of the AISBN algorithm, i.e., 5 blocks of 15 runs, each repeated 10 times with a different random number seed. The number of samples generated was equal to the average number of samples generated by the AISBN algorithm for each series of 15 test runs. We obtained the average MSE = 0.00057, with = 0.000025, min = 0.00052, and max = 0.00065. The best results should be around this range. From Table 2, we can see that the minimum MSE for the AISBN algorithm was 0.00049, within the range of the optimal result. The mean MSE in AISBN is 0.00082, not too far from the optimal results. The standard deviation, , is significantly larger in the AISBN algorithm, but this is understandable given that the process of learning the optimal importance function is heuristic in nature. It is not difficult to understand that there exist a difference between the AISBN results and the optimal results. First, the AISBN algorithm in our tests updated the sampling distribution only 10 times, which may be too few times to let it converge to the optimal importance distribution. Second, even if the algorithm has converged to the optimal importance distribution, the sampling algorithm will still let the parameter oscillate around this distribution and there will be always small differences between the two distributions.
Figure 10 shows the convergence rate for all tested cases for a fourfold increase in sampling time (between 15 and 60 seconds).
