Next: The estimation task Up: Learning Convex Sets of Previous: Interpretations of credal sets

# Estimating a credal set

Our learning theorems (and Walley and Fine's theorems) are generalizations of various law of large numbers theorems. Just as a probability can be induced from the frequencies on an infinite sequence of independent and identically distributed (i.i.d.) outcomes, our results express the idea that a credal set can be induced from an infinite sequence of outcomes. We emphasize that the current theorems are only limiting results, with finite sample cases being deferred for future research.

We begin with some examples that highlight the subtleties of our task.

In this case, ``nature'' is choosing the bias of the coin from the probability interval [0.4,0,6] in a deterministic fashion. An estimation task would be to recover this interval from the infinite series of flips.

In this case, the trial outcomes are actually i.i.d., and a point probability 0.5 would accurately describe the sequence. Thus, although one could say ``nature'' is drawing from a credal set, in this example we have ``nature'' drawing samples from a single probability distribution. We have constructed a hierarchical model for an i.i.d. point probability.

These examples illustrate that a credal set may or may not reveal itself through a sequence of trials, even an infinite one. Therefore, the goal to recover the underlying credal set precisely would be ambiguous. We can still require no estimate to contain distributions that are not in the underlying credal set. This establishes our first requirement: any estimate for a lower envelope must dominate the lower envelope for the underlying credal set.

Examples 1 and 2 share an important characteristic. Suppose one measures the relative frequency of heads as the number of coins goes to infinity. In both cases the relative frequency of heads approaches 0.5. The next example displays a situation where this does not occur.

Many people's initial intuition is that the relative frequency also approaches 0.5 in this example (as ``half'' the coins have a bias of 1/3, the other ``half'' a bias of 2/3). However, this sequence of coins does not have a unique converging relative frequency. Call the relative frequency for heads at coin 2n by r'n. Then limnrarrinf r'n = 1/2 with probability 1. On the other hand, call the relative frequency for heads at coin 2n+2n-1 by r''n. Then limnrarrinf r''n = 4/9. Depending on the way we generate subsequences of relative frequencies, we may get different converging relative frequencies. We conclude that a credal set may create infinite sequences of trials that cannot be represented by any probabilistic model (a single probability distribution cannot generate a sequence with more than one converging relative frequency).

We now formalize the concepts introduced by these examples. Section 4.1 discusses our model for how a sequence is drawn from a credal set. The use of credal sets enriches the basic notion(s) of statistical guarantees, and these generalized notions are discussed in Section 4.2. Section 5 then considers our estimation goal, i.e., what it would mean to estimate that credal set from a sequence of observations. Walley and Fine [27] constructed such an estimator; we present their estimator and results in Section 6.

## Data generation assumptions

Our data generation assumptions (taken from Walley and Fine) are as follows. For the ith trial of the observed sequence, ``nature'' selects an underlying probability distribution, pii. ``Nature'' may select a different distribution for different trials, i.e., it is possible that piinepij. The manner in which these trial distributions are selected is not known to us; it may follow an (unknown) deterministic pattern (examples 1 and 3), there may be elements of randomness involved (example 2), and/or they may depend on actual previous outcomes. While no assumptions are made regarding how ``nature'' selects each trial distribution, we do assume that every trial distribution is contained within a fixed credal set. Once ``nature'' has selected a sequence of distributions, the individual trials are drawn independently and randomly from their corresponding distributions (xipii).

One may interpret the credal set as the most basic model of uncertainty and the selected distributions just as an explanatory device. A different interpretation is that there is a single distribution regulating the data, and this distribution is contained in the credal set [17]. Then our assumptions can be framed as a relaxation of the usual i.i.d. assumption for point probability. In this interpretation, while the trials are independent given the trial distributions, the underlying trial distribution would not have identically distributed marginals, and these marginals would need not be mutually independent.

One can see that our data generation assumptions are in fact appropriate for various physical phenomena. For example, the bias on the rolls of a die may slowly vary or oscillate by small amounts over time as the sides and corners become worn. It has been argued that the actual physical behavior of atomic clocks exhibits a similar type of non-stationarity that is most faithfully modeled by these assumptions [12, 8, 5].

Rather than view ``nature'' as actually drawing samples according to credal sets, the subjectivist may view the data generation somewhat differently. There are variables whose outcomes are to be assessed prior to observing the actual outcomes. However, due to lack of time or other factors, the assessments are to be completed without elaborating a full detailed model of the interactions or correlations between the variables. This interpretation of convex sets of probability is referred to the ontological interpretation in previous research [3, 27]. As actual values for the variables become observed, it is as if the values have been drawn from the perfectly calibrated subjectivist's belief set. In this way, inducing the underlying convex set of probabilities from an infinite observed sequence of data is equivalent to determining whether an agent's subjective interval-valued belief is properly calibrated.

## Asymptotic certainty and favorability

In classical probability theory, asymptotic certainty is at the core of central limit theorems. For example, if a fair coin is flipped infinitely often, the frequency of heads will approach 0.5 with asymptotic certainty. This leaves open the possibility that a very unusual sample is generated by random chance, although as the length of the sequence grows, the chance of usual events become less and less significant.

Alternate versions of this type of limiting guarantee can be defined in the framework of convex probability. The two concepts of primary interest are asymptotic certainty and asymptotic favorability.

Let { A1,A2, &ldots;} be a sequence of events (an event here is a combination of outcomes that either occurs or does not occur when the sequence is generated). When limnrarrinf p(An)rarr1, it is said that A is asymptotically certain, or ``a.c.'' In this case, no matter what strategy ``nature'' uses to choose trial distributions, A will occur in the limit.

A weaker notion of convergence is also useful. When limnrarrinf p(Anc)/p(An)rarr0, where Anc denotes the complement of An, it is said that A is asymptotically favored (a.f.) [27]. For a point probability, asymptotic favorability and asymptotic certainty correspond. In general, asymptotic certainty implies asymptotic favorability; a.f. is much weaker than a.c. In terms of a credal set, asymptotic certainty of an event A means that, for all distributions p() in the credal set, p(A) tends to 1; asymptotic favorability of an event A means that some distributions in the credal set have p(A) tend to 1 and other distributions may have p(A) tend to some non-negative number smaller than 1. Informally, asymptotic favorability only ensures that it is plausible that A occurs with probability 1, but this occurs only if ``nature'' happens to select trial distributions with the appropriate strategy (a ``cooperative nature'').

The concepts of a.c. and a.f. are most commonly applied to describe guarantees on sample statistics or estimators, by saying that statistic F will have property A with asymptotic certainty or favorability.

Next: The estimation task Up: Learning Convex Sets of Previous: Interpretations of credal sets