Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!rochester!udel!news.mathworks.com!newsfeed.internetmci.com!in2.uu.net!news.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Moller's Scaled conjugate gradient (Re: Questions about my new NN program
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DoBo3w.A55@unx.sas.com>
Date: Fri, 15 Mar 1996 18:45:32 GMT
Distribution: inet
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <4gr0p7$iqc@news2.deltanet.com> <21c7cc$11223.178@www.qed.com> <ORSIER.96Mar14094757@cuisun38.unige.ch> <4iba5m$khh@dfw-ixnews4.ix.netcom.com>
Organization: SAS Institute Inc.
Lines: 39


In article <4iba5m$khh@dfw-ixnews4.ix.netcom.com>, jdadson@ix.netcom.com(Jive Dadson ) writes:
|> B.D. Ripley's new book _Pattern Recognition and Neural Networks_
|> appears to be exceedingly well researched, and mentions Moller only
|> in passing. Page 345:
|> 
|>    Moller (1993) has developed one particular version of
|>    conjugate gradients which seems well known in the neural
|>    network field; it uses the out-dated Hestenes-Stiefel
|>    formula for [beta-sub-i] with a particular line-search
|>    algorithm.
|> 
|> He does not give the Hestenes-Stiefel formula, but lists
|> two others, Polak-Ribiere and Fletcher-Reeves, saying Polak-
|> Ribiere is generally preferred. I have used the Polak-Ribiere
|> method from Numerical Recipes in C with very poor results. The
|> Davidon-Fletcher-Powell algorithm, which builds up an approximation
|> to the Hessian, is much faster on the little problems I have
|> tested on.

By default, we use the Powell-Beale automatic restart method for
conjugate gradients. It works well for large networks. I have never
bothered to compare it with the other variants in PROC NLP
(Polak-Ribiere, Fletcher-Reeves, and Fletcher's conjugate descent).
PROC NLP does not have Hestenes-Stiefel listed in the documentation, so
Wolfgang Hartmann (the author of PROC NLP) must have thought it wasn't
useful.

Quasi-Newton methods will certainly work better for medium-sized
networks, and full Newton or Gauss-Newton methods such as
Levenberg-Marquardt or various trust-region algorithms will work better
for small networks. This is all common knowledge in the numerical
analysis literature.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
