Next: Comparison with subjective learning Up: Learning Convex Sets of Previous: Walley and Fine's estimation

# The finite learning theorem for convex sets of distributions

We create a family of estimators whose main purpose is to capture aspects of a sequence of trials that cannot be captured by Walley and Fine's estimator. Consider an infinite sequence of trials X = { x1, x2, &ldots;}. Each trial for event A is generated with a distribution p(A) such that p(A) > p(A). Consider a sub-sequence Xs out of the sequence X. We assume that the probability of any trial is unaffected by the selection mechanism: p(xi isinA | xi isinXs) = p(xi isinA). We must place restrictions on the possible sub-sequence selection rules, because we cannot select trials ``after we see'' the results. Otherwise it would be possible to construct a sub-sequence with only heads or only tails in the coin example. We must be able to specify sub-sequences in some definite way which cannot affect nor be affected by the trials. The definition of a sub-sequence generator that complies with such requirements can be taken from the theory of random numbers, where selection rules are studied to great length. We adopt the definitions of computable selection rules given by Knuth [11] to indicate which entities we consider. We assume that sub-sequences are defined such that there are infinitely many elements in each sub-sequence for an infinitely long original sequence.

To prove the main theorem, we need the following result from Walley and Fine [27, Lemma3.2 summarized,]:

Since the Lemma fails for countably infinite collections of events, we cannot extend our main theorem to countably infinite collections of events with the same tools used in this proof. Whether this can be done with other techniques is an open problem.

Now we can prove:

Proof.

Call p() the lower envelope that generated the data. The conjugate upper envelope is p().

First, rnS() is a lower envelope: the lower envelope of rns() for all s isin S. Now, take each algorithm sj from S. Each sj defines an infinite sequence of trials with probability larger than p(A) for each event A. Now apply Theorem 4.1.a from Walley and Fine [27] on each sub-sequence:

foralleta> 0, &thicksp; &thicksp; A isinA [ p(A) + eta> rn1sj(A) > p(A) - eta] &thicksp; &thicksp; a.c. as &thicksp; n rarrinf&thicksp; under &thicksp; pinf().

In other words, for large enough n, the value of rn(A) will (almost always) be within p(A) and p(A). So the event { rnsj(A) almost within p(A) and p(A) } is a.c.

Now due to Lemma 1, we know that as n rarrinf, the event

{{ 764sj isinS rn764sj(A) almost within p(A) and p(A) }

is a.c.; so the event

{{ min765sj isinS rn765sj(A) almost within p(A) and p(A) }

is a.c..

This result suggests that, given a finite space of events and a sequence of trials, it is possible to find estimates for the lower envelope of the distributions.

A drawback of the estimator rnS() is that it may not capture all limits of pointwise convergent sub-sequences of relative frequencies. The following result indicates how to solve this problem.

So far the discussion has concentrated on the estimation of lower envelopes. A lower envelope corresponds to an infinite number of convex sets of distributions, so statements about estimation of convex sets are stronger than statements about lower envelopes. To be able to attack this problem, we note that there is a one-to-one correspondence between credal sets and lower expectations [25]. If we can estimate lower expectations, we can recover the underlying (unique) convex set of distributions. Walley and Fine also approach this problem and prove that for a measurable utility function u():

foralleta> 0, &thicksp; &thicksp; {[ E[u] + eta> { sum i = 1 n u(xi) }/{ n } > E[u] - eta} &thicksp; &thicksp; a.c. as &thicksp; n rarrinf&thicksp; under &thicksp; pinf().

We can use this result and adapt our Theorem 1 to obtain:

Next: Comparison with subjective learning Up: Learning Convex Sets of Previous: Walley and Fine's estimation