Newsgroups: comp.ai.neural-nets,comp.ai.fuzzy
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!godot.cc.duq.edu!newsgate.duke.edu!news.mathworks.com!newsfeed.internetmci.com!news.sprintlink.net!news-stk-200.sprintlink.net!news.sprintlink.net!news-chi-13.sprintlink.net!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Who invented "backprop"?
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DunL3n.IJq@unx.sas.com>
Date: Tue, 16 Jul 1996 20:33:23 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <DuLoLF.6C4@bcstec.ca.boeing.com> <4sg715$2ub@Venus.mcs.com>
Organization: SAS Institute Inc.
Followup-To: comp.ai.neural-nets
Lines: 43
Xref: glinda.oz.cs.cmu.edu comp.ai.neural-nets:32586 comp.ai.fuzzy:7833


In article <4sg715$2ub@Venus.mcs.com>, drt@MCS.COM (Donald Tveter) writes:
|> ...
|> The earliest claim I've heard of is this one:
|> 
|> Robbins 1951, Robbins, Herbert and Monro, Sutton, ``A Stochastic
|> Approximation Method", in {\it Annals of Mathematical Statistics\/}, 22
|> (1951): 400-407.
|> 
|> of course, like Leif Erickson their discovery had little impact.  For
|> that matter Werbos had little impact.

This all depends on what you mean by "backprop". What Robbins and Monro
invented was what NN people call on-line or incremental training, but
their work had nothing to do with how to compute derivatives. Their
paper had a large impact in fields other than NN.

Werbos developed a convenient way to compute derivatives of complicated
formulas, but I don't think his thesis had anything to do with NNs
specifically (I haven't read the whole thing so I could be wrong); it
seemed to be mostly about ARMA models. For NNs, this is just a simple
application of the chain rule.

Rumelhart, Hinton, and Williams (1986) applied the ideas of a
differentiable activation function, the chain rule, and gradient
descent--all rather obvious things taken by themselves--to NNs and
discovered you could do really cool, non-obvious things with the
combination. They overlooked the crucial detail of a decaying learning
rate, which Robbins and Monro proved was necessary for convergence.
Rumelhart, Hinton, and Williams also added the idea of momentum, which
goes back to Poljak (1964).

See the comp.ai.neural-nets FAQ for references ("What is backprop?" in
ftp://ftp.sas.com/pub/neural/FAQ2.html).

I have removed comp.ai.fuzzy from the follow-ups.


-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
