The Bi-Partite Graph

A bi-partite graph is a network consisting of two layers of nodes, where the top layer is hidden and the bottom layer visible. The only connections in the network are from the top layer (parent nodes) to the bottom layer (child nodes). Such a architecture appears very simple, but already with several tens of nodes it is often intractable to compute the marginals exactly when evidence is presented. A bi-partite graph can be undirected or directed (pointing downwards). A recent example of the first is the PoE (product of experts) from hinton99a with discrete nodes. The `Quick Medical Reference' (QMR) network from shwe91a is a good example for a directed bi-partite graph. Recently there were some approaches to use approximation techniques for this network jaakkola99variational,murphy99a. Therefore we have chosen to use the bi-partite network as an architecture to test the bound propagation algorithm on.

We created bi-partite networks of an equal number of (binary valued) nodes in both layers. Each child node has exactly three parents and each parent points to exactly three children. The connections were made randomly. The conditional probability tables (CPT's) holding the probability of the child given its parents was initialized with uniform random number between zero and one. The prior probability tables for the parents were initialized similarly.

In Figure we show a bi-partite network with twenty nodes in each layer, which is still small enough to compute the exact marginals. The arrows show the parent-child relations. To every third child evidence was presented (the shaded nodes). We ran the bound propagation algorithm with and reached convergence in 37 seconds. For every node we show the marginal over its two states. The thick line is the exact marginal, the top and bottom of the rectangle around it indicate the upper and lower bound respectively. It is clear that for most of the nodes the computed bounds are so tight that they give almost the exact value. For every node (except two) the gap between upper and lower bound is less than one third of the whole region.

The network shown in Figure is small enough to be treated exactly. This enables us to show the computed bounds together with the exact marginals. The bound propagation algorithm, however, can be used for much larger architectures. Therefore we created a bi-partite graph as we did before, but this time with a thousand nodes in each layer. Again we presented evidence to every third node in the bottom layer. The bound propagation algorithm using converged in about 75 minutes. In Figure a we show a histograms of the band widths (upper minus lower bound) found for the single node marginals for all unclamped nodes. Clearly, for the majority of the marginals very tight bounds are found.

Although exact computations are infeasible for this network, one can still use approximation methods to find an estimate of the single node marginals. Therefore, we ran the cluster variation method (CVM) as described in kappen02novel. For the clusters needed by CVM, we simply took all node sets for which a potential is defined. For every unclamped node we computed the relative position of the approximation for the single node marginal within the bounding interval, denoted by . Thus indicates an approximation equal to the lower bound, and similarly for the upper bound. Although there is no obvious reason the approximated marginals should be inside the bounding interval, it turned out they were. A histogram of all the relative positions is shown in Figure b, where we split up the full histogram into three parts, each referring to a certain region of band widths. It is remarkable that the shape of the histogram does not vary much given the tightness of the bounds. Approximations tend to be in the middle, but for all band widths there are approximations close to the edge.

One can argue that we are comparing apples to pears, since we can easily improve the results of both algorithms, while it is not clear which cases are comparable. This is certainly true, but it is not our intention to make a comparison to determine which method is the best. Approximation and bounding methods both have their own benefits. We presented Figure b to give at least some intuition about the actual numbers. In general we found that approximations are usually within the bounding intervals, when computation time is kept roughly the same. This does, however, not make the bounds irrelevant. On the contrary, one could use the bounding values as a validation whether approximations are good or even use them as confidence intervals.