Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!newsflash.concordia.ca!news.nstn.ca!ott.istar!istar.net!van.istar!west.istar!n1van.istar!van-bc!unixg.ubc.ca!news.bc.net!arclight.uoregon.edu!usenet.eel.ufl.edu!news.mathworks.com!newsfeed.internetmci.com!news.sprintlink.net!news-stk-200.sprintlink.net!interpath!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Cross Validation - What for?
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DxJDvM.9CI@unx.sas.com>
Date: Tue, 10 Sep 1996 21:47:46 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <32318926.261F@guest.arnes.si> <3232bcf1.552002@news>
Organization: SAS Institute Inc.
Lines: 105


In article <3232bcf1.552002@news>, steve@tropheus.demon.co.uk (Stephen Wolstenholme) writes:
|> On Sat, 07 Sep 1996 16:39:34 +0200, Ales Brglez
|> <Ales.Brglez@guest.arnes.si> wrote:
|> >I am training some data with back-propagation ANN. I heard that Cross 
|> >Validation should be applied to test robustnes of the model. How is with 
|> >that? I also heard something about Jack-Knife test. What is it's 
|> >purpose?
|> 
|> Numerous methods of cross validation exist. I'm not familiar with
|> "Jack-Knife" so I won't try to guess what it's for. I have the simple
|> approach to cross validation. That is to divide the training set into
|> three parts; training, testing and validating. Train the ANN using the
|> training set but periodically test the ANN with the test set. Use the
|> performance measures on the test set to determine when to stop the
|> training process. Finally, use the validation set to see how well the
|> ANN really performs. 

The above description is not of cross-validation, but of what is most
often called "early stopping" or "stopped training" based on
split-sample validation. Cross-validation and split-sample validation
are both methods of estimating generalization error but have quite
different computational and statistical properties.

Also note that the conventional use of the terms "test set" and
"validation set" is that a validation set is used to stop training or to
select a model, while a test set is used to obtain a final, unbiased
estimate of the generalization error. See, for example, Lutz Prechelt's
technical report that is packaged with the Proben1 collection of
benchmark data; ftp instructions are in the Neural Network FAQ, part 4
of 7: Books, data, etc. at ftp://ftp.sas.com/pub/neural/FAQ4.html

Since the term "cross-validation" is so widely abused, I will repeat the
discussion from the Neural Network FAQ, part 3 of 7: Generalization at
ftp://ftp.sas.com/pub/neural/FAQ3.html:

Subject: What are cross-validation and bootstrapping?

Cross-validation and bootstrapping are both methods for estimating 
generalization error based on "resampling".

In k-fold cross-validation, you divide the data into k subsets of equal
size.  You train the net k times, each time leaving out one of the
subsets from training, but using only the omitted subset to compute
whatever error criterion interests you. If k equals the sample size,
this is called leave-one-out cross-validation. A more elaborate and
expensive version of cross-validation involves leaving out all possible
subsets of a given size. 

Note that cross-validation is quite different from the "split-sample" or
"hold-out" method that is commonly used for early stopping in neural
nets. In the split-sample method, only a single subset (the validation
set) is used to estimate the error function, instead of k different
subsets; i.e., there is no "crossing".  While various people have
suggested that cross-validation be applied to early stopping, the
proper way of doing that is not obvious. 

Cross-validation is also easily confused with jackknifing.  Both involve
omitting each training case in turn and retraining the network on the
remaining subset. But cross-validation is used to estimate
generalization error, while the jackknife is used to estimate the bias
of a statistic.  In the jackknife, you compute some statistic of
interest in each subset of the data. The average of these subset
statistics is compared with the corresponding statistic computed from
the entire sample in order to estimate the bias of the latter.  You can
also get a jackknife estimate of the standard error of a statistic. 

Leave-one-out cross-validation often works well for continuous error
functions such as the mean squared error, but it may perform poorly
for noncontinuous error functions such as the number of misclassified
cases. In the latter case, k-fold cross-validation is preferred. But
if k gets too small, the error estimate is pessimistically biased
because of the difference in sample size between the full-sample
analysis and the cross-validation analyses. A value of 10 for k is
popular. 

Bootstrapping seems to work better than cross-validation in many cases.
In the simplest form of bootstrapping, instead of repeatedly analyzing
subsets of the data, you repeatedly analyze subsamples of the data.
Each subsample is a random sample with replacement from the full sample.
Depending on what you want to do, anywhere from 200 to 2000 subsamples
might be used. There are many more sophisticated bootstrap methods that
can be used not only for estimating generalization error but also for
estimating confidence bounds for network outputs. 

References:

   Efron, B. and Tibshirani, R.J. (1993), An Introduction to the
   Bootstrap, London: Chapman & Hall.

   Hjorth, J.S.U. (1994), Computer Intensive Statistical Methods
   Validation, Model Selection, and Bootstrap, London: Chapman & Hall.

   Masters, T. (1995) Advanced Algorithms for Neural Networks:
   A C++ Sourcebook, NY: John Wiley and Sons, ISBN 0-471-10588-0

   Weiss, S.M. & Kulikowski, C.A. (1991), Computer Systems That
   Learn, Morgan Kaufmann.


-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
