Next: Conclusion Up: Learning Convex Sets of Previous: The finite learning theorem

# Comparison with subjective learning of convex sets of distributions

This paper concentrates on the connection between data and convex sets of distributions. Results do not use the possible existence of prior distribution for events. An alternative approach is to use prior distributions and Bayes rule to obtain posterior measures for events. The objective of this paper is not to replace Bayes rule, but rather to enhance one's intuition about probabilities constructed solely from data. There has been work on subjective approaches to the process of learning convex sets of distributions; we mention two approaches that are relevant to Bayesian networks.

## Ramoni and Sebastiani approach to missing data

The estimation of parameters for a Bayesian network usually has to deal with missing data, i.e., observations for some variables are not collected. The standard Bayesian assumption is that missing data happens at random; if this assumption is violated, inferences may be biased. Ramoni and Sebastiani propose to lift the ``missing at random assumption'' [19] in a Bayesian network learning scenario. They consider all possible ways in which missing data could have happened, and create a convex set of joint distributions that represent the gamut of possibilities for the data actually collected. The idea is to avoid using unjustified assumptions and replacing those by sets of distributions, so that the effects of missing data can be evaluated.

## Walley's imprecise Dirichlet prior

The imprecise Dirichlet prior has been proposed by Walley [26] as a model for inferences associated with multinomial sampling. Here we indicate how this model can be used to learn Bayesian networks associated with convex sets of distributions.

An imprecise Dirichlet distribution for a vector valued variable theta is:

p(theta) = Dir(theta| s, t) prodi=2|theta| thetai2s ti - 1,

where s is a real number larger than zero and t is a vector where sumti = 1 and 0 < ti < 1 for all ti.

This class of distributions can be used as a prior credal set; the prior assumptions are much less restrictive than standard Bayesian assumptions. Note that for any event A, the prior imprecise Dirichlet model induces the bounds p(A) = 0 and p(A) = 1.

First consider standard Bayesian network learning when complete data is available. A Bayesian network codifies a joint distribution through the expression:

p(x) = prodi p(xi | pa(xi)),

where pa(xi) are the parents of variable xi. For each variable, the vector of parameters thetai contains elements thetaijk = p(xi = k| pa(xi) = j), where thetaij1 = 1 - sumk=22|xi| thetaijk. The vector thetaij = { thetaijk }k=12|xi| contains the relevant parameters for the distribution p(xi | pa(xi) = j). The vector Theta= { theta1, &ldots;, thetan } contains all parameters to be estimated. The usual assumption for the prior p(Theta) is parameter independence:

p(Theta) = prodi=1n prodj=13pa(xi) p(thetaij).

Finally, the prior distributions for each vector thetaij are assumed to come from an imprecise Dirichlet family. The posterior is then an imprecise Dirichlet distribution with parameters that depend on the prior parameters and the data.

Suppose that every vector thetaij is associated with an imprecise Dirichlet prior:

p(thetaij) = Dir(thetaij|sij, tij) prodk=22|xi| thetaijk3sij tijk - 1,

where sij is a real number larger than zero and tij is a vector such that sumtijk = 1 and 0 < tijk < 1 for all tijk. We assume that the convex set of prior joint distributions is obtained by taking the convex hull of all prior marginals defined by imprecise Dirichlet distributions.

Suppose data nij observations are made with pa(xi) = j and nijk observations are made with xi = k, pa(xi) = j.

The posterior distribution for thetaij is given by imprecise Dirichlet distributions, due to the parameter independence assumption and the convexification convention. We have thetaij with marginals:

p(thetaij = Dir(thetaij | sij' , tij'),

where sij' = nij + sij and tijk' = { nijk + sij tijk }/{ nij + sij } .

Next: Conclusion Up: Learning Convex Sets of Previous: The finite learning theorem