Next: Expected utility and variance Up: Robustness Analysis of Bayesian Previous: Approximate inferences through Lavine's

Classes of finitely generated convex sets of distributions

This sections demonstrates the generality of the previous algorithms by reducing a large number of classes of distributions to finitely generated credal sets. Inferences with each class can be performed using the algorithms in this paper as long as the classes are formulated as finitely generated convex sets. We first present the epsi-contaminated and lower density bounded classes due to their simplicity. Then the belief function class is analyzed. All other classes admit simple reductions to the belief function class, except the density ratio classes which requires special treatment.

Classes of distributions are used to define the prior distribution p(x) for a variable x. These results can always be extended to conditional distributions of the form p(x|pa(x)). In this case, the vertices of the joint distribution are defined in the following way. For pa(x) = c, define the vertices of the distribution r(x|pa(x) = c); the vertices of the joint distribution are defined by all combinations of r_j(x|pa(x)). Since there are |pa(x)| possible values for c, there are |x|^|pa(x)| vertices for the joint distribution.

The epsi-contaminated Class

An epsi-contaminated class is characterized by a distribution p() and a real number epsi is in (0,1) [Berger1985]:

r(x) = (1-epsi)p(x) + epsiq(x).

This model is appropriate when we expect p(x) to be correct a fraction (1-epsi) of the time, but otherwise any distribution is possible.

An epsi-contaminated class for a variable without parents is the convex hull of the functions:

ra_k(x) = (1-epsi)p(x) + epsideltaa_k(x), for all a_k is in x,

where x is the set of values of x and delta_a(x) is 1 if x = a and 0 otherwise.

The exact and iterative methods derived before apply, but the fact that the vertices of an epsi-contaminated class are exclusively composed of zeros and ones leads to computational savings, because many multiplications of real numbers can be avoided.

The simple structure of the epsi-contaminated class makes it possible to obtain closed-form expressions for the expected bounds on a function u(x) in the univariate case. The expected value with respect to distribution ra_k(x) is (1-epsi) E[u] + epsia_k, so:

E[u] = (1-epsi) E[u] + epsia_k

E[u] = (1-epsi) E[u] + epsia_k,

where a_k and a_k are the minimum and maximum values of x respectively.

The lower density bounded class

Consider the set of all distributions p(x) such that

p(x) >= l(x),

where l() is an arbitrary non-negative function whose sum is smaller than one [Breese & Fertig1991].

Call this class the lower density bounded class. There are approximations (without error bounds) for inferences with this class [Breese & Fertig1991]. The results in this paper improve this to exact and provably convergent inference algorithms. The key fact is that this class is identical to the epsi-contaminated class since it can be specified as the set of distributions:

r(x) = (1-(1-L)) { l(x) }/{ L } + (1-L) q(x),

where L = suml(x) and q(x) is an arbitrary distribution. A lower density bounded class is the convex hull of:

ra_k(x) = l(x) + (1-L) deltaa_k(x), for all a_k is in x.

The belief function class

Consider a discrete variable x with a finite set of values x. To define a probability distribution we must specify |x| numbers, one for each element of x. Suppose we cannot specify precise numbers for those events, so we distribute some probability mass into subsets of x. For example, we place 0.1 mass in the set { x₁, x₂ } without specifying whether the mass goes to x₁ or x₂. Call the mass associated to subset A a basic mass assignment. Since the mass that is distributed in x must add to one, the basic mass assignments must add to one.

The basic mass assignments define a convex set of distributions. Since the value of m(A) corresponds to a probability mass that can be arbitrarily spread among the subsets of A, we can generate vertices of the convex set by concentrating the non-zero basic mass assignments into each one of their subsets, one at a time. There are finitely many ways to distribute the mass among subsets, because x has finitely many values, so the convex set defined by the basic mass assignments is finitely generated. When the non-zero basic mass assignments carry mass to single elements of ŷ, the convex set collapses to a standard Bayesian model.

A belief function can be defined from the basic mass assignment [Shafer1976]:

Bel(A) = sum_{B subA} m(B)

A belief function contains the same information carried by a basic mass assignment, which is called the Möbius transform of the belief function. Mathematically, the Möbius transform of a belief function Bel() is:

m(A) = sum_{B subA} (-1)^|A-B| Bel(B),

and there are algorithms to translate between belief functions and basic mass assignments [Kennes1992]. The value of m(A) corresponds to a probability mass that can be arbitrarily spread among the subsets of A. To generate the vertices of the credal set, we must concentrate the non-zero values of the Möbius transform into each one of their subsets, one at a time.

Usually a belief function is associated with a plausibility function:

Pl(A) = 1 - Bel(A^c).

The convex set of distributions defined by the associated basic mass assignment can be expressed in terms of the belief and plausibility functions, as the unique convex set of distributions p() such that [Wasserman1990]:

Bel(A) <= p(A) <= Pl(A).

Application of Bayes rule to belief function classes is a difficult problem given the non-linear character of Bayes rule. There are expressions that generate exact bounds for posterior quantities in univariate distributions [Halpern & Fagin1992], but the same does not apply to marginalization of multivariate belief function classes. Our solution is to reduce belief function classes to finitely generated convex sets of distributions and apply the methods derived before.

A different way to represent a belief function class is through a set of inequalities. For each subset A with a non-zero basic mass assingment, a inequality can be written:

p(A) >= Bel(A).

The set of all such inequalities defines the belief function class.

The sub-sigma class

Consider a variable x with values x. Sometimes it is hard to specify all the numbers p(x) for all elements of x; a more straightforward approach is to specify probability masses for non-overlapping subsets of x. This characterizes a convex set of distributions for x called a sub-sigma class [Berger1990, Lambert & Duncan1986, Manski1981].

For example, suppose x has 20 values between zero and one, and one wants to specify a distribution for x that is precise for the values around 0.5 but rather imprecise in the tails. One might consider specifying a single probability mass for the lower 5 values, a single probability mass for the upper 5 values, and probability masses for each of the 10 intermediate values.

Every sub-sigma class is a belief function class, since the Möbius transform of a sub-sigma class can be directly read from the the specification of the class. Results of the previous section can be applied to this class.

The density bounded class

A density bounded class [Lavine1991] is the set of all distributions p(x) such that

l(x) <= p(x) <= u(x),

where l() and u() are arbitrary non-negative measures such that sum_x l(x) <= 1 and sum_x u(x) >= 1. The particular case of u(x) equiv1 leads to the lower density bounded class, but the method used before in the lower density bounded class does not apply to the more general setting. A reduction of the class to the class of finitely generated convex sets of distributions is still possible, since a density bounded class is a belief function class. We prove that constructively by providing an algorithm that constructs the Möbius transform of a density bounded class.

The algorithm has an initialization phase:

Initialize Delta(x) = u(x) - l(x) and M = sum_x l(x).
For each element a of x, put m(a) = l(a).
Mark all elements a of x for which l(a) < u(a).

This guarantees the lower bounds l() will be respected, and marks all elements for which we have freedom in choosing probability masses.

AlgorithmRepeat:

Find the marked element b of x with minimum value of Delta().
If Delta(b) + M > 1, then c = 1 - M. Otherwise c = Delta(b).
Define m(set with all unmarked elements of $x$) = c.
Subtract c from all Delta(x) with marked x.
Add c to M.
Unmark b.
If M = 1, stop.

The algorithm selects each one of the unmarked elements until there is no more probability mass to be distributed. The order in which assignments are created (decreasing order of Delta()) guarantees that the belief function class generates all distributions in the density bounded class.

The total variation class

The total variation class is the set of distributions p(x) such that [Wasserman1992b]:

{| p(A) - r(A) } <= epsi for any event A,

where r(x) is a given probability distribution. This class forms the neighborhood of all distributions close to r(x) up to epsi. The definition above defines a set of inequalities that characterizes a class, so application of Lavine's algorithm with linear programming is possible. This class presents efficiency problems since even this straighforward translation into inequalities requires a substantial amount of space (there are 2^|x| inequalities, one for each subset of x).

The density ratio class

A density ratio class consists of all probability densities p(A) so that for any event A [DeRobertis & Hartigan1981]:

l(A) <= alpha p(A) <= u(A)

where l(A) and u(A) are arbitrary positive measures such that l() <= u() and alpha is some positive real number. Notice that the distributions defined by a density ratio class form a convex set.

If we draw two graphs, one for l() and one for u(), then all functions that we can draw between them will be in this family. The definition fixes the ``shape'' boundaries of the class, but leaves unspecified the particular values that can be taken by distributions; to obtain these values we have to normalize the elements of the class.

The density ratio class is probably the easiest to elicit in a Bayesian network setup. Consider the following equivalent definition [Berger1990]: a density ratio class consists of all probability densities p(A) so that, for any events A and B:

{ l(A) }/{ u(B) } <= { p(A) }/{ p(B) } <= { u(A) }/{ l(B) } .

The class is defined by intervals of probability odds: the ratio between the probability of two events. Odds have been advocated for the elicitation of Bayesian networks, as the most natural method to obtain probability judgements from experts [Heckerman1990]. It is a short step to accept that intervals of odds can be used to model imprecise judgements of probability.

Expression (10) also indicates how the specification of a density ratio class can be turned into a set of linear inequalities of the form:

p(A) <= { u(A) }/{ l(B) } p(B),

for arbitrary A and B.

The density ratio class is attractive in univariate models because posterior densities are obtained through operations on the measures l(A) and u(A) [DeRobertis & Hartigan1981]. This computational advantage breaks in multivariate settings as in general we cannot take marginals only on the lower and upper measures [Wasserman1992a], since for every variable x with |x| values, there are |x| |x - 1| probability inequalities represented by expression (11).

To produce inferences with this class, we use the following result [Wasserman & Kadane1992]. A density ratio class defined by l() and u() can be reduced to the union of all density bounded classes defined by betal() and betau() as beta is in [1/(sum_x u(x)), 1/(sum_x l(x))]. Suppose we obtain the bounds for all these density bounded classes; in practice we must discretize the interval of beta values. The upper bound for the density ratio class will be the maximum of the upper bounds for the density bounded classes, and the lower bound for the density ratio class will be the minimum of the lower bounds for the density bounded classes.

Next: Expected utility and variance Up: Robustness Analysis of Bayesian Previous: Approximate inferences through Lavine's

Tue Jan 21 15:59:56 EST 1997