#This file was created by Fri Aug 7 01:03:34 1998
#LyX 0.12 (C) 1995-1998 Matthias Ettrich and the LyX Team
\lyxformat 2.15
\textclass article
\language default
\inputencoding default
\fontscheme default
\graphics default
\paperfontsize default
\spacing single
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default
\layout Author
Avrim Blum and John Langford
\layout Title
More microchoice discussion
\layout Section
A quick counter example
\layout Standard
Here's a quick counter example to the proof I gave earlier: Consider a two
stage algorithm which makes a choice in one continuous space,
\begin_inset Formula \( H_{I1} \)
\end_inset
, then a choice in a second space,
\begin_inset Formula \( H_{I2} \)
\end_inset
, determined by the first choice.
In particular, this algorithm will choose a hypothesis,
\begin_inset Formula \( H_{i1} \)
\end_inset
, which encodes all of the examples in the first space.
The attached hypothesis space,
\begin_inset Formula \( H_{I2} \)
\end_inset
, at this hypothesis just contains a hypothesis which has delta functions
at every positive instance.
Here we have a hypothesis with 0 training error
which generalizes very poorly.
\layout Section
A simple discrete case
\layout Standard
If a sequence of discrete hypothesis spaces,
\begin_inset Formula \( H_{I1} \)
\end_inset
,
\begin_inset Formula \( H_{I2} \)
\end_inset
, ...,
\begin_inset Formula \( H_{IM} \)
\end_inset
are fixed in advance then only the confidence,
\begin_inset Formula \( \delta \)
\end_inset
, needs to be partitioned over each hypothesis space.
This is essentially structural risk minimization and was first pointed
out by Shaw-Taylor and others and again by McAllester.
\layout Section
Partitioning the examples
\layout Standard
If the sequence of hypothesis spaces,
\begin_inset Formula \( H_{I1} \)
\end_inset
,
\begin_inset Formula \( H_{I2} \)
\end_inset
, ...,
\begin_inset Formula \( H_{IM} \)
\end_inset
are not fixed in advance but instead the examples are partitioned with
\begin_inset Formula \( \frac{N}{M} \)
\end_inset
new examples used in each.
Then just use the standard bound with
\begin_inset Formula \( N\rightarrow \frac{N}{M} \)
\end_inset
and
\begin_inset Formula \( \delta \rightarrow \frac{\delta }{M} \)
\end_inset
will apply for the final hypothesis chosen from the final space.
Proof: the new set of examples will be an unbiased estimator on the space
so the standard bounds must apply assuming none of the
\begin_inset Formula \( \delta _{i} \)
\end_inset
confidences have been violated.
\layout Standard
There are several interesting things about this example.
First of all, it is the setting of online algorithms.
Second, this bound is actually better than the earlier one I claimed.
To see this, note that partitioning
\begin_inset Formula \( \epsilon \rightarrow \frac{\epsilon }{2} \)
\end_inset
means
\begin_inset Formula \( \sqrt{\frac{\ln C(\frac{\epsilon }{24},H_{I})+\ln \frac{1}{\delta }}{N}\rightarrow } \)
\end_inset
\begin_inset Formula \( \sqrt{\frac{\ln C(\frac{\epsilon }{24},H_{I})+\ln \frac{1}{\delta }}{4N}} \)
\end_inset
which implies that
\begin_inset Formula \( N\rightarrow 4N \)
\end_inset
.
With the new bound, in order to maintain the same confidence we need to
merely do
\begin_inset Formula \( N\rightarrow 2N \)
\end_inset
.
This discrepancy disappears in the realizable case.
\layout Section
same size child spaces
\layout Standard
The sequence of discrete hypothesis spaces are not fixed in advance, but
instead we know that all possible
\begin_inset Formula \( H_{Im} \)
\end_inset
are of the same size for a particular
\begin_inset Formula \( m\in [1,...,M] \)
\end_inset
.
Then we can use a bound based on
\begin_inset Formula \( |H_{I1}|*|H_{I2}|*...*|H_{IM}| \)
\end_inset
= the total number of possible hypothesis.
proof: we have bounded the number of possible hypothesis so the normal
bound applies.
\layout Section
different size child spaces
\layout Standard
There is a sequence of discrete hypothesis spaces which are not fixed in
advance and of varying sizes.
In other words, the algorithm proceeds by making a choice from
\begin_inset Formula \( H_{I1} \)
\end_inset
which implies an attached (yum, fibre bundles in machine learning!),
\begin_inset Formula \( H_{I2} \)
\end_inset
, from which a choice is made leading to an attached
\begin_inset Formula \( H_{I3} \)
\end_inset
, and so on....
The bound still uses
\begin_inset Formula \( |H_{I1}|*|H_{I2}|*...*|H_{IM}| \)
\end_inset
.
Proof: we imagine an a.
priori assignment of measure/probability/weight (I'm not sure what to call
this,
\begin_inset Quotes eld
\end_inset
measure
\begin_inset Quotes erd
\end_inset
seems most correct, but McAllester used
\begin_inset Quotes eld
\end_inset
probability
\begin_inset Quotes erd
\end_inset
) to every every final hypothesis,
\begin_inset Formula \( H_{final} \)
\end_inset
which is:
\begin_inset Formula \( M(H_{final})=\frac{1}{|H_{I1}|*|H_{I2}|*...*|H_{IM}|} \)
\end_inset
.
All of these assignments will sum to one,
\begin_inset Formula \( \sum _{H_{final}}M(H_{final})=1 \)
\end_inset
.
When taking the union bound in the standard discrete learning theory, we
can unevenly weight the (normally uniform) confidence assigned to each
hypothesis.
\layout Standard
This bound is interesting.
We do not need to actually calculate
\begin_inset Formula \( M(H_{final}) \)
\end_inset
for every
\begin_inset Formula \( H_{final} \)
\end_inset
.
Instead, we can calculate
\begin_inset Formula \( M(H_{final}) \)
\end_inset
by the following process: Start with an estimate of
\begin_inset Formula \( e:=1 \)
\end_inset
.
Every time a choice is made from a hypothesis space,
\begin_inset Formula \( H_{Im} \)
\end_inset
, update the estimate by:
\begin_inset Formula \( e:=\frac{e}{|H_{Im}|} \)
\end_inset
.
When the algorithm finishes it will have a bound on the true error which
is dependent on the data.
Furthermore, this bound is calculated using only the sizes of spaces which
the algorithm actually makes a choice in.
\layout Section
Some interesting cases to think about
\layout Subsection
random then algorithm
\layout Standard
Consider an algorithm which works with the following, admittedly odd, method.
Before looking at the data, a true random source is used to make a choice
between several hypothesis space.
Then the algorithm learns on the space it has chosen, outputing both it's
final choice and an indication of which hypothesis space it learned on.
What is the bound on the error for this algorithm? It should be the same
bound as an algorithm which simply learns on the randomly chosen hypothesis
space.
\layout Subsection
algorithm then random
\layout Standard
Now switch the order of events.
An algorithm learns on a particular hypothesis space to get a hypothesis.
Then an attached hypothesis space has a choice made uniformly on it (without
looking at the data).
I believe this will have the same error bound as an algorithm which doesn't
make the same random choice.
There is a theorem in decision theory related to this problem.
\layout Section
Extensions to the continuous case
\layout Standard
Clearly we need to induce a metric between hypothesis spaces.
This is a bit difficult because hypothesis spaces because two hypothesis
spaces can have varying size.
Perhaps the following will do:
\layout Description
\begin_inset Formula \( =\frac{1}{2}(\int _{H_{J}}min_{H_{I}}dP(H_{j}) \)
\end_inset
\begin_inset Formula \( +\int _{H_{I}}min_{H_{J}}dP(H_{i})) \)
\end_inset
\layout Standard
This says that the distance between two hypothesis spaces is found by averaging
over the minimal distance between two two elements of the hypothesis space
weighted by a measure
\begin_inset Formula \( P(H_{i}) \)
\end_inset
and
\begin_inset Formula \( P(H_{j}) \)
\end_inset
over the hypothesis spaces.
This metric isn't entirely satisfactory to me because if
\begin_inset Formula \( H_{i}=H_{j} \)
\end_inset
but the measures are different the metric will still give 0.
Any ideas for a better metric?
\the_end