next up previous
Next: The Role of AIS-BN Up: Experimental Results Previous: Experimental Method

Results for the CPCS Network

The main network used in our tests is a subset of the CPCS (Computer-based Patient Case Study) model [Pradhan et al.1994], a large multiply-connected multi-layer network consisting of 422 multi-valued nodes and covering a subset of the domain of internal medicine. Among the 422 nodes, 14 nodes describe diseases, 33 nodes describe history and risk factors, and the remaining 375 nodes describe various findings related to the diseases. The CPCS network is among the largest real networks available to the research community at the present time. The CPCS network contains many extreme probabilities, typically on the order of 10-4. Our analysis is based on a subset of 179 nodes of the CPCS network, created by Max Henrion and Malcolm Pradhan. We used this smaller version in order to be able to compute the exact solution for the purpose of measuring approximation error in the sampling algorithms.

The AIS-BN algorithm has some learning overhead. The following comparison of execution time vs. number of samples may give the reader an idea of this overhead. Updating the CPCS network with 20 evidence nodes on our system takes the AIS-BN algorithm a total of 8.4 seconds to learn. It generates subsequently 3,640 samples per second, while the SIS algorithm generates 2,631 samples per second, and the LW algorithm generates 4,167 samples per second. In order to remain conservative towards the AIS-BN algorithm, in all our experiments we fixed the execution time of the algorithms (our limit was 60 seconds) rather than the number of samples. In the CPCS network with 20 evidence nodes, in 60 seconds, AIS-BN generates about 188,000 samples, SIS generates about 158,000 samples and LW generates about 250,000 samples.

Figure: The probability distribution of evidence Pr($ \bf
E$ = $ \bf e$) in our experiments.

We generated a total of 75 test cases consisting of five sequences of 15 test cases each. We ran each test case 10 times, each time with a different setting of the random number seed. Each sequence had a progressively higher number of evidence nodes: 15, 20, 25, 30, and 35 evidence nodes respectively. The evidence nodes were chosen randomly (equiprobable sampling without replacement) from those nodes that described various plausible medical findings. Almost all of these nodes were leaf nodes in the network. We believe that this constituted very realistic test cases for the algorithms. The distribution of the prior probability of evidence, Pr($ \bf E$ = $ \bf e$), across all test runs of our experiments is shown in Figure 4. The least likely evidence was 5.54×10-42, the most likely evidence was 1.37×10-9, and the median was 7×10-24.

Figure: A typical plot of convergence of the tested sampling algorithms in our experiments -- Mean Square Error as a function of the execution time for a subset of the CPCS network with 20 evidence nodes chosen randomly among plausible medical observations ( Pr($ \bf
E$ = $ \bf e$) = 3.33×10-26 in this particular case) for the AIS-BN, the SIS, and the LW algorithms. The curve for the AIS-BN algorithm is very close to the horizontal axis.

Figure 6: The lower part of the plot of Figure 5 showing the convergence of the AIS-BN algorithm to correct posterior probabilities.

Figures 5 and 6 show a typical plot of convergence of the tested sampling algorithms in our experiments. The case illustrated involves updating the CPCS network with 20 evidence nodes. We plot the MSE after the initial 15 seconds during which the algorithms start converging. In particular, the learning step of the AIS-BN algorithm is usually completed within the first 9 seconds. We ran the three algorithms in this case for 150 seconds rather than the 60 seconds in the actual experiment in order to be able to observe a wider range of convergence. The plot of the MSE for the AIS-BN algorithm almost touches the X axis in Figure 5. Figure 6 shows the same plot in a finer scale in order to show more detail in the AIS-BN convergence curve. It is clear that the AIS-BN algorithm dramatically improves the convergence rate. We can also see that the results of AIS-BN converge to exact results very fast as the sampling time increases. In the case captured in Figures 5 and 6, a tenfold increase in the sampling time (after subtracting the overhead for the AIS-BN algorithm, it corresponds to a 21.5-fold increase in the number of samples) results in a 4.55-fold decrease of the MSE (to MSE $ \leq$ 0.00048). The observed convergence of both SIS and LW algorithms was poor. A tenfold increase in sampling time had practically no effect on accuracy. Please note that this is a very typical case observed in our experiments.

Figure 7: Convergence of the conditional probabilities during the example run of the AIS-BN algorithm captured in Figure 6. The displayed fragment of the conditional probability table belongs to node gasAcute which is a parent of one of the evidence nodes.

Table 1: A fragment of the conditional probability table of a node of the CPCS network (node gasAcute, parents hepAcute=Mild and wbcTotTho=False) in Figure 6.
  Original CPT Exact ICPT Learned ICPT
``Absent'' 0.99631 0.0037 0.015
``Mild'' 0.00183 0.1560 0.164
``Moderate'' 0.00093 0.1190 0.131
``Severe'' 0.00093 0.7213 0.690

Figure 7 illustrates the ICPT learning process of the AIS-BN algorithm for the sample case shown in Figure 6. The displayed conditional probabilities belong to the node gasAcute which is a parent of two evidence nodes, difInfGasMuc and abdPaiExaMea. The node gasAcute has four states: ``absent,'' ``mild,'' ``moderate,'' and ``severe'', and two parents. We randomly chose a combination of its parents' states as our displayed configuration. The original CPT for this configuration without evidence, the exact ICPT with evidence and the learned ICPT with evidence are summarized numerically in Table 1. Figure 7 illustrates that the learned importance conditional probabilities begin to converge to the exact results stably after three updating steps. The learned probabilities in Step 10 are close to the exact results. In this example, the difference between Pr(xi|pa(Xi),$ \bf e$) and Pr(xi|pa(Xi)) is very large. Sampling from Pr(xi|pa(Xi)) instead of Pr(xi|pa(Xi),$ \bf e$) would introduce large variance into our results.

Figure 8: Performance of the AIS-BN, SIS, and LW algorithms: Mean Square Error for each of the 75 individual test cases plotted against the probability of evidence. The sampling time is 60 seconds.

Table 2: Summary of the simulation results for all of the 75 simulation cases on the CPCS network. Figure 8 shows each of the 75 cases graphically.
$ \mu$ 0.00082 0.110 0.148
$ \sigma$ 0.00022 0.076 0.093
min 0.00049 0.0016 0.0031
median 0.00078 0.105 0.154
max 0.00184 0.316 0.343

Figure 8 shows the MSE for all 75 test cases in our experiments with the summary statistics in Table 2. A paired one-tailed t-test resulted in statistically highly significant differences between the AIS-BN and SIS algorithms ( p < 3.1×10-20), and also between the SIS and LW algorithms ( p < 1.7×10-8). As far as the magnitude of difference is concerned, AIS-BN was significantly better than SIS. SIS was better than LW, but the difference was small. The mean MSEs of SIS and LW algorithms were both greater than 0.1, which suggests that neither of these algorithms is suitable for large Bayesian networks.

Figure 9: The ratio of MSE between SIS and AIS-BN versus percentage.

The graph in Figure 9 shows the MSE ratio between the AIS-BN and SIS algorithms. We can see that the percentage of the cases whose ratio was greater than 100 (two orders of magnitude improvement!) is 60%. In other words, we obtained two orders of magnitude improvement in MSE in more than half of the cases. In 80% cases, the ratio was greater than 50. The smallest ratio in our experiments was 2.67, which happened when posterior probabilities were dominated by the prior probabilities. In that case, even though the LW and SIS algorithms converged very fast, their MSE was still far larger than that of AIS-BN.

Our next experiment aimed at showing how close the AIS-BN algorithm can approach the best possible sampling results. If we know the optimal importance sampling function, the convergence of the AIS-BN algorithm should be the same as that of forward sampling without evidence. In other words, the results of the probabilistic logic sampling algorithm without evidence approach the limit of how well stochastic sampling can perform. We ran the logic sampling algorithm on the CPCS network without evidence mimicking the test runs of the AIS-BN algorithm, i.e., 5 blocks of 15 runs, each repeated 10 times with a different random number seed. The number of samples generated was equal to the average number of samples generated by the AIS-BN algorithm for each series of 15 test runs. We obtained the average MSE $ \mu$ = 0.00057, with $ \sigma$ = 0.000025, min = 0.00052, and max = 0.00065. The best results should be around this range. From Table 2, we can see that the minimum MSE for the AIS-BN algorithm was 0.00049, within the range of the optimal result. The mean MSE in AIS-BN is 0.00082, not too far from the optimal results. The standard deviation, $ \sigma$, is significantly larger in the AIS-BN algorithm, but this is understandable given that the process of learning the optimal importance function is heuristic in nature. It is not difficult to understand that there exist a difference between the AIS-BN results and the optimal results. First, the AIS-BN algorithm in our tests updated the sampling distribution only 10 times, which may be too few times to let it converge to the optimal importance distribution. Second, even if the algorithm has converged to the optimal importance distribution, the sampling algorithm will still let the parameter oscillate around this distribution and there will be always small differences between the two distributions.

Figure 10 shows the convergence rate for all tested cases for a four-fold increase in sampling time (between 15 and 60 seconds).

Figure 10: The distribution of the convergence ratio of the AIS-BN, SIS, and LW algorithms when the number of samples increases four times.

We adjusted the convergence ratio of the AIS-BN algorithm by dividing it by a constant. According to Equation 3, the theoretically expected convergence ratio for a four-fold increase in the number of samples should be around two. There are about 96% cases among the AIS-BN runs whose ratio lays in the interval (1.75, 2.25], in a sharp contrast to 11% and 13% cases in the SIS and LW algorithms. The ratios of the remaining 4% cases in AIS-BN lay in the interval [2.25, 2.5]. In the SIS and LW algorithms, the percentage of cases whose ratio were smaller than 1.5 was 71% and 77% respectively. Less than 1.5 means that the number of samples was too small to estimate variance and the results cannot be trusted. The ratio greater than 2.25 means possibly that 60 seconds was long enough to estimate the variance, but 15 seconds was too short.

next up previous
Next: The Role of AIS-BN Up: Experimental Results Previous: Experimental Method
Jian Cheng 2000-10-01