Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!oitnews.harvard.edu!purdue!lerc.nasa.gov!magnus.acs.ohio-state.edu!math.ohio-state.edu!howland.reston.ans.net!news.sprintlink.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Fast NN learning algorithms
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DEEI1q.G72@unx.sas.com>
Date: Mon, 4 Sep 1995 21:51:26 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <Pine.SOL.3.91.950829091658.4012A-100000@suma3.reading.ac.uk> <e64_9509031803@tbus.muc.de>
Organization: SAS Institute Inc.
Lines: 91


In article <e64_9509031803@tbus.muc.de>, rst@tbus.muc.de (Rudolf Stricker) writes:
|> shrhanan@reading.ac.uk wrote:
|>
|> sau> I am doing some work with Multilayer Perceptrons.  I am
|> sau> looking for a fast learning algorithm since backpropagation
|> sau> is so slow.
|>
|> sau> Having looked in the major journals, the algorithms
|> sau> with the best performance appears to be the second
|> sau> order methods, such as conjugate gradient.
|>
|> Imo, there is no best or fastest learning algorithm, because performance
|> severely depends on the problem that has to be modeled / solved.

This is a very important point, especially since it is common in the
NN literature to make claims about the relative speed of various
algorithms based just on XOR and one or two other examples.

|> If we have the standard case with (much) more ("independend") examples
|> than free parameters (like many people here around seem to like to avoid
|> "overfitting" (whatever this rather subjective concept may tell us),
|> "learning" is something like an identification job, which may defined by
|> an over-estimated set of (nonlinear) equations. So we may use all of
|> these more or less sofisticated methods from "nonlinear optimization"
|> for "training" neural nets (or other "models"), including "back
|> propagation" (which is one of the weakest in terms of nonlinear
|> optimization methods).

Of the numerous algorithms from the numerical analysis literature, the
following classes are particularly useful:

 * Levenberg-Marquardt algorithms for nets with 10s of weights.
 * Quasi-Newton algorithms for nets with 100s of weights.
 * Conjugate gradient algorithms for nets with 1000s of weights.
 * Simulated annealing algorithms for data sets with severe local-optima
   problems.

Note that all of the above algorithms come in numerous varieties, so
there is no such thing as "the" Levenberg-Marquardt algorithm, etc.

Of the methods developed in the NN literature, Quickprop and RPROP
are useful, especially when the number of weights exceeds the number
of training cases.

And for people using stopped training, keep in mind that speed can be
excessive--you should never use any Newton-type algorithm (such as
Levenberg-Marquardt or quasi-Newton ). As far as I know, it is an open
question how fast an algorithm can be for use with stopped training.
I usually use conjugate gradients with a limit on the step size, but
setting that limit can be a bit tricky (although less tricky than
setting the learning rate in standard backprop).

Some references:

   Fahlman, S.E. (1988), "An empirical study of learning speed in
   back-propagation networks", CMU-CS-88-162, School of Computer Science,
   Carnegie Mellon University.

   Fahlman, S.E. (1989), "Faster-Learning Variations on
   Back-Propagation: An Empirical Study", in Touretzky, D., Hinton, G, and
   Sejnowski, T., eds., _Proceedings of the 1988 Connectionist Models
   Summer School_, Morgan Kaufmann, 38-51.

   Fletcher, R. (1987) Practical Methods of Optimization, Wiley: NY.

   Gill, E.P., Murray, W. and Wright, M.H. (1981) Practical
   Optimization, Academic Press: London.

   Riedmiller, M. and Braun, H. (1993), "A Direct Adaptive Method for
   Faster Backpropagation Learning: The RPROP Algorithm", Proceedings
   of the IEEE International Conference on Neural Networks 1993, San
   Francisco: IEEE.

|> To get a vivid overview over the methods available, it might be helpful
|> to join ???'s (sorry, do not remember the inventor's name) metaphoric
|> kangeroo, when it searches for the top of Mt.Everest (= best solution).

The kangaroo posts are available by anonymous ftp from ftp.sas.com
(Internet gateway IP 192.35.83.8) in the directory /pub/sugi19/neural :

 README         This document.
 ...
 kangaroos      Nontechnical explanation of training methods and
                nonlinear optimization

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
