This sections demonstrates the generality of the previous algorithms by reducing a large number of classes of distributions to finitely generated credal sets. Inferences with each class can be performed using the algorithms in this paper as long as the classes are formulated as finitely generated convex sets. We first present the epsi-contaminated and lower density bounded classes due to their simplicity. Then the belief function class is analyzed. All other classes admit simple reductions to the belief function class, except the density ratio classes which requires special treatment.

Classes of distributions are used to define the prior distribution p(x)
for a variable x. These results can always be extended to conditional
distributions of the form p(x|_{j}(x|^{|pa(x)|} vertices for
the joint distribution.

An epsi-contaminated class is characterized by a distribution p() and a real number epsi is in (0,1) [Berger1985]:

r(x) = (1-epsi)p(x) + epsiq(x).

This model is appropriate when we expect p(x) to be correct a fraction (1-epsi) of the time, but otherwise any distribution is possible.An epsi-contaminated class for a variable without parents is the convex hull of the functions:

r_{}a_{k}(x) = (1-epsi)p(x) + epsidelta_{}a_{k}(x),
_{k} is in x,

The exact and iterative methods derived before apply, but the fact that the vertices of an epsi-contaminated class are exclusively composed of zeros and ones leads to computational savings, because many multiplications of real numbers can be avoided.

The simple structure of the epsi-contaminated class makes
it possible to obtain closed-form expressions for the expected bounds
on a function u(x) in the univariate case. The expected value with
respect to distribution r_{}a_{k}(x) is (1-epsi) E[u] + epsia_{k},
so:

__E__[u] = (1-epsi) E[u] + epsi__a___{k}

~~E~~[u] = (1-epsi) E[u] + epsi~~a~~_{k},

Consider the set of all distributions p(x) such that

p(x) >= l(x),

where l() is an arbitrary non-negative function whose sum is smaller than one [Breese & Fertig1991].
Call this class the *lower density bounded class*. There are
approximations (without error bounds) for inferences with this class
[Breese & Fertig1991]. The results in this paper improve this to exact and provably
convergent inference algorithms. The key fact is that this class
is identical to the epsi-contaminated class since it can
be specified as the set of distributions:

r(x) = (1-(1-L)) { l(x) }/{ L } + (1-L) q(x),

where L = suml(x) and q(x) is an arbitrary distribution. A lower density bounded class is the convex hull of:r_{}a_{k}(x) = l(x) + (1-L) delta_{}a_{k}(x),
_{k} is in x.

Consider a discrete variable x with a finite set of values
x. To define a probability distribution we must specify
|x| numbers, one for each element of x. Suppose
we cannot specify precise numbers for those events, so we distribute
some probability mass into subsets of x. For example, we
place 0.1 mass in the set { x_{1}, x_{2} } without specifying whether
the mass goes to x_{1} or x_{2}. Call the mass associated to
subset A a *basic mass assignment*.
Since the mass that is distributed in x must add to one,
the basic mass assignments must add to one.

The basic mass assignments define a convex set of distributions. Since the value of m(A) corresponds to a probability mass that can be arbitrarily spread among the subsets of A, we can generate vertices of the convex set by concentrating the non-zero basic mass assignments into each one of their subsets, one at a time. There are finitely many ways to distribute the mass among subsets, because x has finitely many values, so the convex set defined by the basic mass assignments is finitely generated. When the non-zero basic mass assignments carry mass to single elements of ŷ, the convex set collapses to a standard Bayesian model.

A belief function can be defined from the basic mass assignment [Shafer1976]:

_{B subA} m(B)

m(A) = sum_{B subA} (-1)^{|A-B|}

Usually a belief function is associated with a plausibility function:

^{c}).

Application of Bayes rule to belief function classes is a difficult problem given the non-linear character of Bayes rule. There are expressions that generate exact bounds for posterior quantities in univariate distributions [Halpern & Fagin1992], but the same does not apply to marginalization of multivariate belief function classes. Our solution is to reduce belief function classes to finitely generated convex sets of distributions and apply the methods derived before.

A different way to represent a belief function class is through a set of inequalities. For each subset A with a non-zero basic mass assingment, a inequality can be written:

p(A) >=

Consider a variable x with values x. Sometimes it is hard to specify all the numbers p(x) for all elements of x; a more straightforward approach is to specify probability masses for non-overlapping subsets of x. This characterizes a convex set of distributions for x called a sub-sigma class [Berger1990, Lambert & Duncan1986, Manski1981].

For example, suppose x has 20 values between zero and one, and one wants to specify a distribution for x that is precise for the values around 0.5 but rather imprecise in the tails. One might consider specifying a single probability mass for the lower 5 values, a single probability mass for the upper 5 values, and probability masses for each of the 10 intermediate values.

Every sub-sigma class is a belief function class, since the Möbius transform of a sub-sigma class can be directly read from the the specification of the class. Results of the previous section can be applied to this class.

A density bounded class [Lavine1991] is the set of all distributions p(x) such that

l(x) <= p(x) <= u(x),

where l() and u() are arbitrary non-negative measures such that sumThe algorithm has an initialization phase:

- Initialize Delta(x) = u(x) - l(x) and M = sum
_{x}l(x). - For each element a of x, put m(a) = l(a).
- Mark all elements a of x for which l(a) < u(a).

- Find the marked element b of x with minimum value of Delta().
- If Delta(b) + M > 1, then c = 1 - M. Otherwise c = Delta(b).
- Define m(
set with all unmarked elements of $x$) = c. - Subtract c from all Delta(x) with marked x.
- Add c to M.
- Unmark b.
- If M = 1, stop.

The algorithm selects each one of the unmarked elements until there is no more probability mass to be distributed. The order in which assignments are created (decreasing order of Delta()) guarantees that the belief function class generates all distributions in the density bounded class.

The total variation class is the set of distributions p(x) such that [Wasserman1992b]:

{|

A density ratio class consists of all probability densities p(A) so that for any event A [DeRobertis & Hartigan1981]:

l(A) <= alpha p(A) <= u(A)

where l(A) and u(A) are arbitrary positive measures such that l() <= u() and alpha is some positive real number. Notice that the distributions defined by a density ratio class form a convex set.If we draw two graphs, one for l() and one for u(), then all functions that we can draw between them will be in this family. The definition fixes the ``shape'' boundaries of the class, but leaves unspecified the particular values that can be taken by distributions; to obtain these values we have to normalize the elements of the class.

The density ratio class is probably the easiest to elicit in a Bayesian network setup. Consider the following equivalent definition [Berger1990]: a density ratio class consists of all probability densities p(A) so that, for any events A and B:

{ l(A) }/{ u(B) } <= { p(A) }/{ p(B) } <= { u(A) }/{ l(B) } .

The class is defined by intervals ofExpression (10) also indicates how the specification of a density ratio class can be turned into a set of linear inequalities of the form:

p(A) <= { u(A) }/{ l(B) } p(B),

for arbitrary A and B.The density ratio class is attractive in univariate models because posterior densities are obtained through operations on the measures l(A) and u(A) [DeRobertis & Hartigan1981]. This computational advantage breaks in multivariate settings as in general we cannot take marginals only on the lower and upper measures [Wasserman1992a], since for every variable x with |x| values, there are |x| |x - 1| probability inequalities represented by expression (11).

To produce inferences with this class,
we use the following result [Wasserman & Kadane1992]. A density ratio class
defined by l() and u() can be reduced to the union of
all density bounded classes defined by betal() and betau()
as beta is in [1/(sum_{x} u(x)), 1/(sum_{x} l(x))].
Suppose we obtain the bounds for all these density bounded classes;
in practice we must discretize the interval of beta values.
The upper bound for the density ratio class will be the
maximum of the upper bounds for the density bounded classes, and
the lower bound for the density ratio class will be the
minimum of the lower bounds for the density bounded classes.

Tue Jan 21 15:59:56 EST 1997