#This file was created by Wed Aug 12 21:09:53 1998
#LyX 0.12 (C) 1995-1998 Matthias Ettrich and the LyX Team
\lyxformat 2.15
\textclass article
\language default
\inputencoding default
\fontscheme default
\graphics default
\paperfontsize default
\spacing single
\papersize Default
\paperpackage a4
\use_geometry 0
\use_amsmath 0
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\quotes_times 2
\papercolumns 1
\papersides 1
\paperpagestyle default
\layout Author
John Langford jcl@cs.cmu.edu 8/10/98
\layout Title
Adaptions of Yoav's Self bounding learning algorithms
\layout Section
Theorem 1
\layout Standard
Theorem 1 states that
\begin_inset Formula \( \forall \alpha >0,\forall \epsilon >0 \)
\end_inset
the generalization error of the final hypothesis is bounded by:
\begin_inset Formula
\[
P_{S\sim D^{m}}[err(h^{*})-\widehat{err}(h^{*})>\epsilon ]\leq 2|Q(T_{A}(D,\alpha ))|e^{-2m\alpha ^{2}}+|H(T_{A}(D,\alpha ))|e^{-2m\epsilon ^{2}}\]
\end_inset
\layout Standard
The first term in the bound bounds the probability that some query will
return an answer which is 'bad' enough to step off the tree,
\begin_inset Formula \( T_{A}(D,\alpha ) \)
\end_inset
.
The second term bounds the (agnostic) difference between the empirical
and true error of the hypothesis.
\layout Subsection
first generalization
\layout Standard
We can generalize this theorem to state:
\begin_inset Formula
\[
P_{S\sim D^{m}}[err(h^{*})-\widehat{err}(h^{*})>\sqrt{\frac{1}{2m}\log \frac{1}{\delta _{h^{*}}}}]\leq 2|Q(T_{A}(D,\alpha ))|e^{-2m\alpha ^{2}}+\delta \]
\end_inset
\layout Standard
where
\begin_inset Formula \( \delta =\sum _{h\in H(T_{A}(D,\alpha ))}\delta _{h} \)
\end_inset
.
One useable choice of
\begin_inset Formula \( \delta _{h} \)
\end_inset
is
\begin_inset Formula \( \frac{\delta }{|Q_{1T}||Q_{2T}|...|Q_{dT}|} \)
\end_inset
where
\begin_inset Formula \( Q_{iT} \)
\end_inset
= the branching factor (contained in
\begin_inset Formula \( T_{A} \)
\end_inset
) at step i in the algorithm and the chain of queries leads to the choice
of hypothesis,
\begin_inset Formula \( h \)
\end_inset
.
\layout Subsection
second generalization
\layout Standard
I believe (though I'm unsure not having read Kearns paper) that a similar
trick can be played with the first term to apportion confidence amongst
the various queries in the tree.
\layout Section
A slightly more general microchoice theorem
\layout Standard
In http://www.cs.cmu.edu/~jcl/microchoice/moremicro.ps, section 5, we gave a
theorem that a bound on the generalization error could be given by:
\layout Standard
\begin_inset Formula
\[
P_{S\sim D^{m}}[err(h^{*})-\widehat{err}(h^{*})>\sqrt{\frac{1}{2m}\log \frac{1}{\delta _{h^{*}}}}]\leq \delta \]
\end_inset
with
\begin_inset Formula \( \delta _{h}=\frac{\delta }{|H_{I1}|*|H_{I2}|*...*|H_{IM}|} \)
\end_inset
.
\layout Standard
This theorem can be made slightly more general by moving to a setting where
a 'candidate hypothesis' is not required.
Instead, identify every choice node in an algorithm - choice nodes are
locations where choices which could affect the final hypothesis are made.
For example, one choice node might rest on the output of a statistical
query.
Denote by
\begin_inset Formula \( C_{i} \)
\end_inset
these choice nodes.
Then we can replace the division of our confidence by,
\begin_inset Formula \( \delta _{h}=\frac{\delta }{|C_{1}|*|C_{2}|*...*|C_{d}|} \)
\end_inset
where
\begin_inset Formula \( d \)
\end_inset
is the last choice made resulting in the choice of hypothesis,
\begin_inset Formula \( h \)
\end_inset
.
\layout Section
Comparison with Yoav
\layout Standard
The more general microchoice theorem is similar to Yoav's work because it
includes the choice nodes which allows application to algorithms which
use statistical queries.
The more generali microchoice theorem is tighter for cases where
\begin_inset Formula \( T_{A}(D,\alpha ) \)
\end_inset
includes all possible choices along the path of the algorithm.
Yoav's theorem is tighter for cases where
\layout Itemize
The algorithm happens to choose a final hypothesis with an above average
inverse measure.
(doesn't hold for the first generalization above)
\layout Itemize
The branching factor of the tree is smaller than the branching factor of
the choice points.
\layout Standard
The second case can be removed by changing the measure to reflect which
choices are 'low probability' choices by using a 'luckiness function' across
a set of choices.
\the_end