This sections demonstrates the generality of the previous algorithms by reducing a large number of classes of distributions to finitely generated credal sets. Inferences with each class can be performed using the algorithms in this paper as long as the classes are formulated as finitely generated convex sets. We first present the epsi-contaminated and lower density bounded classes due to their simplicity. Then the belief function class is analyzed. All other classes admit simple reductions to the belief function class, except the density ratio classes which requires special treatment.
Classes of distributions are used to define the prior distribution p(x)
for a variable x. These results can always be extended to conditional
distributions of the form p(x|
An epsi-contaminated class is characterized by a distribution p() and a real number epsi is in (0,1) [Berger1985]:
r(x) = (1-epsi)p(x) + epsiq(x).
This model is appropriate when we expect p(x) to be correct a fraction (1-epsi) of the time, but otherwise any distribution is possible.An epsi-contaminated class for a variable without parents is the convex hull of the functions:
rak(x) = (1-epsi)p(x) + epsideltaak(x),
The exact and iterative methods derived before apply, but the fact that the vertices of an epsi-contaminated class are exclusively composed of zeros and ones leads to computational savings, because many multiplications of real numbers can be avoided.
The simple structure of the epsi-contaminated class makes it possible to obtain closed-form expressions for the expected bounds on a function u(x) in the univariate case. The expected value with respect to distribution rak(x) is (1-epsi) E[u] + epsiak, so:
E[u] = (1-epsi) E[u] + epsiak
E[u] = (1-epsi) E[u] + epsiak,
Consider the set of all distributions p(x) such that
p(x) >= l(x),
where l() is an arbitrary non-negative function whose sum is smaller than one [Breese & Fertig1991].Call this class the lower density bounded class. There are approximations (without error bounds) for inferences with this class [Breese & Fertig1991]. The results in this paper improve this to exact and provably convergent inference algorithms. The key fact is that this class is identical to the epsi-contaminated class since it can be specified as the set of distributions:
r(x) = (1-(1-L)) { l(x) }/{ L } + (1-L) q(x),
where L = suml(x) and q(x) is an arbitrary distribution. A lower density bounded class is the convex hull of:rak(x) = l(x) + (1-L) deltaak(x),
Consider a discrete variable x with a finite set of values x. To define a probability distribution we must specify |x| numbers, one for each element of x. Suppose we cannot specify precise numbers for those events, so we distribute some probability mass into subsets of x. For example, we place 0.1 mass in the set { x1, x2 } without specifying whether the mass goes to x1 or x2. Call the mass associated to subset A a basic mass assignment. Since the mass that is distributed in x must add to one, the basic mass assignments must add to one.
The basic mass assignments define a convex set of distributions. Since the value of m(A) corresponds to a probability mass that can be arbitrarily spread among the subsets of A, we can generate vertices of the convex set by concentrating the non-zero basic mass assignments into each one of their subsets, one at a time. There are finitely many ways to distribute the mass among subsets, because x has finitely many values, so the convex set defined by the basic mass assignments is finitely generated. When the non-zero basic mass assignments carry mass to single elements of ŷ, the convex set collapses to a standard Bayesian model.
A belief function can be defined from the basic mass assignment [Shafer1976]:
m(A) = sumB subA (-1)|A-B|
Usually a belief function is associated with a plausibility function:
Application of Bayes rule to belief function classes is
a difficult problem given the non-linear character of Bayes rule.
There are expressions that generate exact bounds for posterior
quantities in univariate distributions [Halpern & Fagin1992], but
the same does not apply to marginalization of multivariate belief function
classes
.
Our solution is to reduce belief function classes to finitely generated
convex sets of distributions and apply the methods derived before.
A different way to represent a belief function class is through a set of inequalities. For each subset A with a non-zero basic mass assingment, a inequality can be written:
p(A) >=
Consider a variable x with values x. Sometimes it is hard to specify all the numbers p(x) for all elements of x; a more straightforward approach is to specify probability masses for non-overlapping subsets of x. This characterizes a convex set of distributions for x called a sub-sigma class [Berger1990, Lambert & Duncan1986, Manski1981].
For example, suppose x has 20 values between zero and one, and one wants to specify a distribution for x that is precise for the values around 0.5 but rather imprecise in the tails. One might consider specifying a single probability mass for the lower 5 values, a single probability mass for the upper 5 values, and probability masses for each of the 10 intermediate values.
Every sub-sigma class is a belief function class, since the Möbius transform of a sub-sigma class can be directly read from the the specification of the class. Results of the previous section can be applied to this class.
A density bounded class [Lavine1991] is the set of all distributions p(x) such that
l(x) <= p(x) <= u(x),
where l() and u() are arbitrary non-negative measures such that sumx l(x) <= 1 and sumx u(x) >= 1. The particular case of u(x) equiv1 leads to the lower density bounded class, but the method used before in the lower density bounded class does not apply to the more general setting. A reduction of the class to the class of finitely generated convex sets of distributions is still possible, since a density bounded class is a belief function class. We prove that constructively by providing an algorithm that constructs the Möbius transform of a density bounded class.The algorithm has an initialization phase:
The total variation class is the set of distributions p(x) such that [Wasserman1992b]:
{|
A density ratio class consists of all probability densities p(A) so that for any event A [DeRobertis & Hartigan1981]:
l(A) <= alpha p(A) <= u(A)
where l(A) and u(A) are arbitrary positive measures such that l() <= u() and alpha is some positive real number. Notice that the distributions defined by a density ratio class form a convex set.If we draw two graphs, one for l() and one for u(), then all functions that we can draw between them will be in this family. The definition fixes the ``shape'' boundaries of the class, but leaves unspecified the particular values that can be taken by distributions; to obtain these values we have to normalize the elements of the class.
The density ratio class is probably the easiest to elicit in a Bayesian network setup. Consider the following equivalent definition [Berger1990]: a density ratio class consists of all probability densities p(A) so that, for any events A and B:
{ l(A) }/{ u(B) } <= { p(A) }/{ p(B) } <= { u(A) }/{ l(B) } .
The class is defined by intervals of probability odds: the ratio between the probability of two events. Odds have been advocated for the elicitation of Bayesian networks, as the most natural method to obtain probability judgements from experts [Heckerman1990]. It is a short step to accept that intervals of odds can be used to model imprecise judgements of probability.Expression (10) also indicates how the specification of a density ratio class can be turned into a set of linear inequalities of the form:
p(A) <= { u(A) }/{ l(B) } p(B),
for arbitrary A and B.The density ratio class is attractive in univariate models because posterior densities are obtained through operations on the measures l(A) and u(A) [DeRobertis & Hartigan1981]. This computational advantage breaks in multivariate settings as in general we cannot take marginals only on the lower and upper measures [Wasserman1992a], since for every variable x with |x| values, there are |x| |x - 1| probability inequalities represented by expression (11).
To produce inferences with this class, we use the following result [Wasserman & Kadane1992]. A density ratio class defined by l() and u() can be reduced to the union of all density bounded classes defined by betal() and betau() as beta is in [1/(sumx u(x)), 1/(sumx l(x))]. Suppose we obtain the bounds for all these density bounded classes; in practice we must discretize the interval of beta values. The upper bound for the density ratio class will be the maximum of the upper bounds for the density bounded classes, and the lower bound for the density ratio class will be the minimum of the lower bounds for the density bounded classes.
© Fabio Cozman[Send Mail?] Tue Jan 21 15:59:56 EST 1997