Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!godot.cc.duq.edu!news.duke.edu!news.mathworks.com!newsfeed.internetmci.com!in2.uu.net!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: TR available on the convergence properties of backpropagation
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DqwACp.783@unx.sas.com>
Date: Sat, 4 May 1996 19:05:13 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References:  <317F9481.59E2@research.nj.nec.com>
Organization: SAS Institute Inc.
Lines: 48


In article <317F9481.59E2@research.nj.nec.com>, Lee Giles <giles@research.nj.nec.com> writes:
|> 
|>       What Size Neural Network Gives Optimal Generalization? 
|>          Convergence Properties of Backpropagation
|> 
|>    Steve Lawrence (1,3), C. Lee Giles (1,2), Ah Chung Tsoi (3)
|> ... 
|>                           ABSTRACT
|> 
|> One of the most important aspects of any machine learning paradigm is
|> how it scales according to problem size and complexity. Using a task
|> with known optimal training error, and a pre-specified maximum number
|> of training updates, we investigate the convergence of the
|> backpropagation algorithm with respect to a) the complexity of the
|> required function approximation, b) the size of the network in
|> relation to the size required for an optimal solution, and c) the
|> degree of noise in the training data.  In general, for a) the solution
|> found is worse when the function to be approximated is more complex,
|> for b) oversize networks can result in lower training and
|> generalization error, and for c) the use of committee or ensemble
|> techniques can be more beneficial as the amount of noise in the
|> training data is increased. For the experiments we performed, we do
|> not obtain the optimal solution in any case. 

Note that these conclusions apply only to inept training methods
such as standard backprop. If you use more sophisticated training
methods such as Levenberg-Marquardt, quasi-Newton, or conjugate
gradients (see "What are conjugate gradients, Levenberg-Marquardt, etc.?"
in ftp://ftp.sas.com/pub/neural/FAQ2.html), you can find the global
optimum reliably with minimal networks in these examples, and you can
easily get overfitting with larger networks. And if you use effective
committee or ensemble methods (e.g. Bayesian), you will get improved
prediction in low-noise cases as well as high-noise.

|> Is now available from the following sites:
|> 
|> http://www.neci.nj.nec.com/homepages/giles.html      - USA
|> http://www.neci.nj.nec.com/homepages/lawrence        - USA
|> http://www.cs.umd.edu/TRs/TR-no-abs.html     - USA
|> http://www.elec.uq.edu.au/~lawrence          - Australia
|> ftp://ftp.nj.nec.com/pub/giles/papers/UMD-CS-TR-3617.what.size.neural.net.to.use.ps.Z

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
