\documentstyle[12pt]{article}
\pagestyle{empty}
\marginparwidth 0pt
\oddsidemargin  0pt
\evensidemargin  0pt
\marginparsep 0pt
\topmargin   0pt
\textwidth   6.5in
\textheight  8.5 in

\def\hb {\hfil\break}
\def\implies {\Rightarrow}

\begin{document}

\noindent
Title: {\bf Preventing ``Overfitting'' of Test Data in Hypothesis Selection} 		\hb
\noindent
Author: Andrew Y. Ng 									\hb
\noindent
email: $<$Andrew.Ng@cs.cmu.edu$>$							\hb
\noindent
Postal address: Carnegie Mellon University, 1000 Morewood Ave, Pittsburgh PA 15213    	\hb
\hb
\noindent
Keywords: Overfitting, Cross Validation, Hypothesis selection \hb
\hb
\noindent
Summary:

When selecting a hypothesis from a set of hypotheses for a
learning task, it is common practice to evaluate them on 
a test set, and then to select the hypothesis that had the 
lowest test error. But when the test set is partially corrupted 
such as by noise, and if the set of hypotheses we are selecting
from is large, then ``folklore'' also warns about overfitting 
the test data. In this paper, we explain how picking the hypothesis
with the lowest test error causes this overfitting phenomenon:

\begin{itemize}
  \item Even though hypotheses with lower generalization
		error do have lower expected test error,
		the reverse is not necessarily true, in
		that Lower test error $\protect\not\implies$ 
			Lower expected generalization error.
		This is caused by 
		differing {\em variances} of
		test errors among hypotheses with different 
		generalization errors.
%		$\ehat(h)|\varepsilon(h)$'s 
		(Details of argument in paper.)
  \item If errors between the hypotheses are uncorrelated, then some
		``mean'' hypothesis, that somehow agrees with most of the 
		other hypotheses, can be expected to do well. And at least 
		on one metric (explained in the paper), the lowest--test--error
		hypothesis is maximally different from doing this.
\end{itemize}

\noindent
We also show the surprising result that this problem can be overcome 
by selecting a hypothesis with a {\em higher} test error, over others
with lower test errors,
% We give reasons for not picking the 
%hypothesis with the lowest test error, 
and propose a new 
algorithm, LOOCVCV, that uses a computationally efficient form 
of leave--one--out cross validation to select such a hypothesis. 
Finally, we present experimental results for one domain,
that show LOOCVCV 
consistently beating picking the hypothesis with the lowest test 
error, even when using reasonably large test sets.

\end{document}

