Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!gatech!EU.net!sun4nl!sci.kun.nl!news
From: "Pierre v.d. Laar" <pierre>
Subject: Re: Question about the momentum in a BP neural net.
Content-Type: text/plain; charset=us-ascii
Message-ID: <D9FpGq.86@sci.kun.nl>
To: leducf@ERE.UMontreal.CA
Sender: news@sci.kun.nl (News owner)
Nntp-Posting-Host: anthemius.mbfys.kun.nl
Content-Transfer-Encoding: 7bit
Organization: KUN
References: <leducf.801408622@tornade.ERE.UMontreal.CA>
Mime-Version: 1.0
Date: Wed, 31 May 1995 08:41:13 GMT
X-Mailer: Mozilla 1.1N (X11; I; SunOS 4.1.3_U1 sun4m)
X-Url: news:leducf.801408622@tornade.ERE.UMontreal.CA
Lines: 52

Hello Francois

Maybe interesting to read:
@ARTICLE{Wiegerinck_Komoda_Heskes94,
  AUTHOR = "Wiegerinck, W. and Komoda, A. and Heskes, T.",
  TITLE  = "Stochastic dynamics of learning with momentum in neural networks",
  YEAR   = 1994,
  JOURNAL = "Journal of Physics A",
  VOLUME  = 27,
  PAGES   = "4425--4437",
  ABSTRACT = "We study on-line learning with momentum term for 
              nonlinear learning rules. Through introduction of 
              auxiliary variables, we show that the
              learning process can be described by a Markov process.

              For small learning parameters $\eta$ and momentum 
              parameters $\alpha$ close to 1, such that 
              $\gamma = \eta/(1-\alpha)^2$ is finite,
              the time scales for the evolution of the weights
              and the auxiliary variables are the same. In this case
              Van Kampen's expansion can be applied in a 
              straightforward manner.
              We obtain evolution equations for the average 
              network state and the fluctuations around this average. 
              These evolution equations depend
              (after rescaling of time and fluctuations) only on $\gamma$:
              all combinations $(\eta,\alpha)$ with the same value
              of $\gamma$ give rise to similar behaviour.

              The case $\alpha$ constant and $\eta$ small requires 
              a completely different analysis. 
              There are two different time scales: a fast time
              scale on which the auxiliary variables equilibrate 
              and a slow time scale for the change of the weights. 
              By projection on the space of slow variables the fast 
              variables can be eliminated. We find that for
              small learning parameters $\eta$ and finite momentum 
              parameters $\alpha$ learning with momentum is equivalent 
              to learning without momentum term with rescaled 
              learning parameter $\tilde{\eta} = \eta/(1-\alpha)$.

              Simulations with the nonlinear Oja learning rule
              confirm the theoretical results.",
  URL      = "ftp://ftp.mbfys.kun.nl/snn/pub/reports/Wiegerinck.2.ps.Z"
}



Greetings

	Pi\"erre van de Laar

