Our learning theorems (and Walley and Fine's theorems) are generalizations of
various law of large numbers theorems.
Just as a probability can be induced from the frequencies on an infinite sequence of independent and
identically distributed (i.i.d.) outcomes,
our results express the idea that a credal set can be induced from an *infinite* sequence of outcomes.
We emphasize that the current theorems are only limiting results, with finite sample cases being
deferred for future research.

We begin with some examples that highlight the subtleties of our task.

In this case, ``nature'' is choosing the bias of the coin from the probability interval [0.4,0,6] in a deterministic fashion. An estimation task would be to recover this interval from the infinite series of flips.

In this case, the trial outcomes are actually i.i.d., and a point probability 0.5 would accurately describe the sequence. Thus, although one could say ``nature'' is drawing from a credal set, in this example we have ``nature'' drawing samples from a single probability distribution. We have constructed a hierarchical model for an i.i.d. point probability.

These examples illustrate that a credal set may or may not reveal itself through a sequence of trials,
even an infinite one. Therefore, the goal to recover *the* underlying credal set precisely would
be ambiguous. We can still require no estimate to contain distributions that are not in the
underlying credal set. This establishes our first requirement: any estimate for a lower envelope must
dominate the lower envelope for the underlying credal set.

Examples 1 and 2 share an important characteristic. Suppose one measures the relative frequency of heads as the number of coins goes to infinity. In both cases the relative frequency of heads approaches 0.5. The next example displays a situation where this does not occur.

Many people's initial intuition is that the relative frequency also approaches 0.5 in this
example (as ``half'' the coins have a bias of 1/3, the other ``half'' a bias of 2/3).
However, this sequence of coins does *not* have a unique converging relative frequency.
Call the relative frequency for heads at coin 2^{n} by r'_{n}. Then
_{nrarrinf} r'_{n} = 1/2 with probability 1.
On the other hand, call the relative frequency for heads at coin
2^{n}+2^{n-1} by r''_{n}. Then _{nrarrinf} r''_{n} = 4/9.
Depending on the way we generate *subsequences of relative frequencies*, we may get
different converging relative frequencies.
We conclude that a credal set may create infinite sequences of trials that *cannot*
be represented by any probabilistic model (a single probability distribution cannot
generate a sequence with more than one converging relative frequency).

We now formalize the concepts introduced by these examples. Section 4.1 discusses our model for how a sequence is drawn from a credal set. The use of credal sets enriches the basic notion(s) of statistical guarantees, and these generalized notions are discussed in Section 4.2. Section 5 then considers our estimation goal, i.e., what it would mean to estimate that credal set from a sequence of observations. Walley and Fine [27] constructed such an estimator; we present their estimator and results in Section 6.

Our data generation assumptions (taken from Walley and Fine) are as follows.
For the i^{th} trial of the observed sequence, ``nature'' selects an underlying
probability distribution, pi_{i}. ``Nature'' may select a different distribution
for different trials, i.e., it is possible that pi_{i}nepi_{j}. The manner in
which these trial distributions are selected is not known to us; it may follow an
(unknown) deterministic pattern (examples 1 and
3), there may be elements of randomness involved (example
2),
and/or they may depend on actual previous outcomes. While no assumptions are made
regarding how ``nature'' selects each trial distribution, we do assume that
*every trial distribution is contained within a fixed credal set*.
Once ``nature'' has selected a sequence of distributions, the individual trials are drawn independently and
randomly from their corresponding distributions (x_{i}pi_{i}).

One may interpret the credal set as the most basic model of uncertainty and the selected distributions just as an explanatory device. A different interpretation is that there is a single distribution regulating the data, and this distribution is contained in the credal set [17]. Then our assumptions can be framed as a relaxation of the usual i.i.d. assumption for point probability. In this interpretation, while the trials are independent given the trial distributions, the underlying trial distribution would not have identically distributed marginals, and these marginals would need not be mutually independent.

One can see that our data generation assumptions are in fact appropriate for various physical phenomena. For example, the bias on the rolls of a die may slowly vary or oscillate by small amounts over time as the sides and corners become worn. It has been argued that the actual physical behavior of atomic clocks exhibits a similar type of non-stationarity that is most faithfully modeled by these assumptions [12, 8, 5].

Rather than view ``nature'' as actually drawing samples according to credal sets,
the subjectivist may view the data generation somewhat differently.
There are variables whose outcomes are to be assessed prior to observing the actual outcomes.
However, due to lack of time or other factors, the assessments are to be completed
without elaborating a full detailed model of the interactions or correlations between the variables.
This interpretation of convex sets of probability is referred to the *ontological interpretation* in
previous research [3, 27]. As
actual values for the variables become observed, it is as if the values have been drawn from the perfectly
calibrated subjectivist's belief set.
In this way, inducing the underlying convex set of probabilities from an infinite observed sequence of data
is equivalent to determining whether an agent's subjective interval-valued belief is properly calibrated.

In classical probability theory, asymptotic certainty is at the core of central limit theorems.
For example, if a fair coin is flipped infinitely often, the frequency of heads will approach 0.5
*with asymptotic certainty*. This leaves open the possibility that a very unusual sample is
generated by random chance, although as the length of the sequence grows, the chance of
usual events become less and less significant.

Alternate versions of this type of limiting guarantee can be defined in the framework of convex probability. The two concepts of primary interest are asymptotic certainty and asymptotic favorability.

Let { A_{1},A_{2}, &ldots;} be a sequence of events (an event here is a combination
of outcomes that either occurs or does not occur when the sequence is generated).
When _{nrarrinf} __p__(A_{n})rarr1,
it is said that A is *asymptotically certain*, or ``a.c.''
In this case, no matter what strategy ``nature'' uses to choose trial distributions,
A will occur in the limit.

A weaker notion of convergence is also useful.
When _{nrarrinf} __p__(A_{n}^{c})/__p__(A_{n})rarr0,
where A_{n}^{c} denotes the complement of A_{n},
it is said that A is *asymptotically favored* (a.f.) [27].
For a point probability, asymptotic favorability and asymptotic certainty correspond.
In general, asymptotic certainty implies asymptotic favorability;
a.f. is much weaker than a.c. In terms of a credal set, asymptotic
certainty of an event A means that, for all distributions p()
in the credal set, p(A) tends to 1; asymptotic favorability of an event A
means that some distributions in the credal set have p(A) tend to 1 and other
distributions may have p(A) tend to some non-negative number smaller than 1.
Informally, asymptotic favorability only ensures that it is plausible that A occurs
with probability 1, but this occurs only if ``nature'' happens to select
trial distributions with the appropriate strategy (a ``cooperative nature'').

The concepts of a.c. and a.f. are most commonly applied to describe guarantees on sample statistics or estimators, by saying that statistic F will have property A with asymptotic certainty or favorability.

Sun Jun 29 22:16:40 EDT 1997