Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!nntp.sei.cmu.edu!cis.ohio-state.edu!math.ohio-state.edu!howland.reston.ans.net!Germany.EU.net!news.dfn.de!news.dlr.de!smagt
From: smagt@donau.df.op.dlr.de (Patrick van der Smagt)
Subject: Re: Looking for conjugate gradient
X-Nntp-Posting-Host: dv.op.dlr.de
Message-ID: <smagt.819291389@dv>
Sender: news@news.dlr.de
Organization: DLR
References: <4a736b$ajv@goya.eunet.es> <4agrgb$eho@fstgal00.tu-graz.ac.at> <cem14.39.0009D599@cornell.edu> <hruschka.2.30D1A6A5@alf3.ngate.uni-regensburg.de>
Date: Mon, 18 Dec 1995 12:56:29 GMT
Lines: 50

hruschka@alf3.ngate.uni-regensburg.de (Harald HRUSCHKA LST. MARKETING) writes:

>In article <cem14.39.0009D599@cornell.edu> cem14@cornell.edu (Carlos Murillo-Sanchez) writes:
>>From: cem14@cornell.edu (Carlos Murillo-Sanchez)
>>Subject: Re: Looking for conjugate gradient
>>Date: Mon, 11 Dec 1995 09:49:59

>>In article <4agrgb$eho@fstgal00.tu-graz.ac.at> GILLETTE@JOANNEUM.ADA.AT (Karine Gillette) writes:

>>>In article <4a736b$ajv@goya.eunet.es>, bolsamad@dial.eunet.es says...

>>>>Does anybody know where can I find a well explained conjugate gradient
>>>>algorithm (by D.F. Shanno, inexact search, BFGS...) for improve the 
>>>>convergence of my backpropagation NN ?

>>>I read that the conjugate gradient descent was a synonyme of Backpropagation 
>>>with momentum term.

>>No, they definitely aren't the same.  Conjugate gradient methods
>>search along n succesive orthogonal directions (n=# of parameters).
>>Under some assumptions (i.e., you are in a quadratic basin and
>>you do line searches on each direction to find the minimum in
>>that particular direction) the conjugate gradient methods have
>>the quadratic convergence property: they reach the precise minimum
>>in n steps.  Hence they are very useful for the 'end game',
>>that is, if you are really interested in finding an exact local
>>minimum (many people aren't interested, because of the early
>>stopping issue).

It is, however, true that conjugate gradient is a special case of
back-propagation with a momentum term.  What the conjugate gradient
algorithm does is compute values for the learning parameter and the
momentum, such that subsequent search directions are conjugate to
each other (conjugate, in this case, means perpendicular in the error
space).  Each time a minimisation is performed *along this search
direction*.  Therefore, if you create n perpendicular search directions
in an n-dimensional error space (e.g., a neural network with n weights),
the n+1'st of these directions must be a null-vector!  Otherwise
it cannot be perpendicular to the n previous search directions.
Thus the minimum is reached.

Caveat lector: in general the error spaces are *not* quadratic.  Therefore,
after these n steps a restart is required.  Nevertheless they work fine for
all optimisation problems, not worse than error back-propagation.

In case you want to read more, consult
	http://www.op.dlr.de/~smagt/thesis/chapter2.ps.gz
It compares a few methods for neural network learning.

Patrick van der Smagt
