Randomised ExperimentsDetails of the experimentsWater Network

Water Network

In order to construct a pseudo-natural example, we used the water network from the Bayesian network repository12 and modified it to let it exhibit context-specific independence. For each table, a variable was declared redundant if the differences in the probabilities for the values of this variable were less than a threshold of 0.05 from each other (thus, if we chose the midpoint of a reduced table, the original probabilities were all less than 0.025 from this midpoint). In order to discover context-specific independence, we carried out a greedy top-down algorithm to build a decision tree. If we are building the conditional for variable Xi, we chose the variable Y to split on that results in the maximum number of pairs of numbers where the values for Xi are within the threshold 0.05 of each other. We then recursively remove redundant variables from each side, and keep splitting. Once we have built a tree, for each node, we can decide whether to use the tabular representation or a factored representation. In these experiments, we only committed to the context-based representation when the total size of the context-based representation (obtained simply by summing the sizes of the tables) was less than 51% of the tabular representation.

It should be noted that these thresholds were not chosen arbitrarily. If we used 0.03 instead of 0.05, we didn't find any context-specific independence. If we chose 0.07 instead of 0.05, then the tabular representation collapses. If we chose 80% or 99% instead of 51% as the threshold for accepting a change, we got smaller tables, but much larger run times.

Figure * shows some of the details of some of the data with no observations. Figures * and * shows some of the details of some of the data with 5 and 10 observation respectively. These make up the data shown in Figure 10 of the main paper. We show the index of the variable given the ordering in the repository (we started counting at 0).

All of the times are in milliseconds for a Java implementation running on 700 MHz Pentium running Linux with 768 megabytes of memory. In order to allow for the variation in run times due to garbage collection each evaluation was run three times and the smallest time was returned. The space is for the maximum table created in VE or the maximum size of the sum of all of the confactors created during the elimination of one variable.

CVE VE
Query Var Time Size Time Size
#11(CKND_12_15) 15039 1327104 25368 5308416
#13(CBODN_12_15) 620 16384 4552 442368
#27(CKND_12_45) 3808 186624 53965 15925248
#25(CKNI_12_45) 1708 36864 57414 7077888
#22(CKNN_12_30) 367 16128 3821 442368
#21(CBODN_12_30) 7953 193536 13997 1769472
#17(CKNI_12_30) 742 48384 3677 442368
#22(CKNN_12_30) 363 16128 3846 442368
#19(CKND_12_30) 7939 774144 26654 2359296
#15(CNON_12_15) 8177 193536 14599 1769472
#12(CNOD_12_15) 618 37044 4264 442368
#3(CKND_12_00) 419 29376 9414 1327104
#16(C_NI_12_30) 429 28224 3799 442368
#30(CKNN_12_45) 2902 112896 52648 15925248
#4(CNOD_12_00) 2496 110592 42270 5308416
#21(CBODN_12_30) 8042 193536 13619 1769472
#5(CBODN_12_00) 992 112320 3334 442368
#19(CKND_12_30) 7936 774144 25637 2359296
#11(CKND_12_15) 16290 1327104 24753 5308416
#23(CNON_12_30) 604 37044 3535 442368

Data for random queries with no observed variables for the water network
 

CVE VE
Observed Query Var Time Size Time Size
#1=2 #2=2 #26=2 #28=1 #30=1 #8(C_NI_12_15) 84 9216 579 36864
#6=2 #8=1 #11=2 #14=2 #24=1 #22(CKNN_12_30) 15 816 156 9216
#8=3 #14=0 #15=3 #18=3 #20=2 #6(CKNN_12_00) 9 336 26 2304
#5=1 #9=0 #12=2 #15=0 #16=3 #10(CBODD_12_15) 54 576 71 6912
#6=1 #7=0 #11=1 #13=3 #25=2 #27(CKND_12_45) 1184 28224 3210 147456
#2=1 #6=0 #13=0 #19=1 #25=0 #18(CBODD_12_30) 123 9216 224 12288
#0=3 #2=1 #18=2 #20=1 #23=1 #17(CKNI_12_30) 16 1728 87 5184
#4=3 #10=3 #12=3 #17=0 #28=2 #6(CKNN_12_00) 284 6912 162 20736
#3=1 #11=1 #19=1 #25=1 #31=3 #14(CKNN_12_15) 1450 36864 1041 147456
#10=3 #19=0 #21=0 #26=0 #27=0 #16(C_NI_12_30) 1076 49152 521 49152
#11=1 #13=0 #21=1 #25=2 #29=2 #24(C_NI_12_45) 353 28080 13928 1327104
#3=0 #23=1 #26=1 #27=0 #28=3 #15(CNON_12_15) 14258 193536 2600 442368
#9=2 #10=1 #13=1 #25=2 #26=1 #30(CKNN_12_45) 75 9216 2646 248832
#9=1 #11=0 #14=2 #15=1 #27=0 #25(CKNI_12_45) 65 4096 120 4096
#2=0 #10=2 #15=2 #17=1 #24=1 #8(C_NI_12_15) 12 576 767 110592
#2=0 #7=3 #15=1 #21=2 #27=2 #16(C_NI_12_30) 29 1008 531 82944
#0=3 #2=0 #6=0 #7=2 #16=2 #29(CBODN_12_45) 7 336 1304 147456
#2=2 #20=2 #24=2 #25=0 #30=0 #21(CBODN_12_30) 453 49152 877 49152
#8=2 #11=2 #19=2 #25=1 #29=3 #9(CKNI_12_15) 76 9216 216 9216
#9=1 #12=3 #13=0 #19=2 #21=3 #26(CBODD_12_45) 31 1024 440 49152

Data for random queries with 5 randomly observed variables for the water network
 

CVE VE
Observed Query Time Size Time Size
#2=3 #3=0 #7=2 #12=1 #13=2
#19=1 #21=2 #22=2 #26=1 #30=1 #14 30 2304 40 2304
#0=0 #3=0 #8=3 #12=3 #13=0
#16=2 #18=2 #26=3 #27=2 #28=1 #7 14 256 17 768
#1=2 #6=0 #8=0 #11=2 #17=0
#20=3 #22=1 #23=1 #24=2 #26=2 #25 7 256 7 256
#2=2 #5=3 #8=0 #9=1 #10=1
#17=2 #18=1 #22=2 #28=0 #30=1 #23 4 192 4 192
#4=2 #7=1 #8=2 #12=1 #13=2
#15=3 #17=0 #19=0 #22=2 #31=2 #28 10 256 8 256
#1=1 #7=1 #9=1 #10=0 #11=0
#13=3 #14=1 #23=1 #24=1 #30=2 #22 9 336 10 768
#2=1 #4=0 #6=0 #15=0 #18=0
#22=2 #23=3 #24=2 #29=2 #30=1 #25 14 768 31 2304
#3=0 #10=0 #14=1 #16=3 #19=0
#24=1 #25=2 #28=3 #30=1 #31=1 #7 544 9216 121 9216
#1=2 #2=3 #3=1 #9=2 #10=0
#14=0 #16=3 #25=1 #28=3 #30=2 #8 10 1024 16 1024
#2=3 #5=1 #6=0 #11=2 #12=0
#17=1 #22=0 #24=3 #27=0 #28=1 #25 22 768 22 768
#8=3 #9=2 #10=0 #11=1 #12=1
#14=2 #15=3 #19=0 #22=2 #26=3 #5 3 84 4 192
#1=0 #7=2 #8=2 #13=0 #15=3
#17=2 #20=3 #26=1 #27=0 #31=3 #24 11 256 8 256
#4=2 #5=3 #6=1 #9=1 #10=0
#12=2 #17=0 #19=1 #25=0 #29=0 #23 6 256 23 1024
#0=0 #2=1 #11=1 #13=1 #17=2
#21=3 #22=1 #23=1 #24=2 #30=2 #27 10 576 18 768
#4=1 #9=0 #10=0 #11=1 #12=2
#23=1 #25=2 #29=1 #30=0 #31=2 #5 77 4096 141 12288
#1=2 #6=0 #7=1 #10=3 #12=1
#15=3 #16=1 #17=2 #23=2 #24=1 #27 3 64 4 144
#1=2 #2=2 #3=2 #5=2 #9=2
#13=1 #15=0 #22=1 #25=2 #30=0 #18 4 112 65 12288
#0=3 #1=1 #5=1 #6=0 #7=1
#8=2 #15=3 #17=0 #24=3 #25=0 #21 76 768 16 1024
#0=2 #6=1 #8=0 #9=2 #10=2
#16=0 #18=0 #19=0 #21=0 #26=0 #13 7 576 12 768
#1=1 #3=0 #4=2 #9=1 #10=3
#13=0 #14=2 #22=0 #23=0 #30=2 #6 37 768 17 768

Data for random queries with 10 randomly observed variables for the water network
 
David Poole and Nevin Lianwen Zhang,Exploiting Contextual Independence In Probabilistic Inference, Journal of Artificial Intelligence Research, 18, 2003, 263-313.

Randomised ExperimentsDetails of the experimentsWater Network