Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news.harvard.edu!news2.near.net!MathWorks.Com!news.duke.edu!concert!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: HELP ON BACKPROP NN
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <Cx6BoD.LrC@unx.sas.com>
Date: Wed, 5 Oct 1994 00:21:49 GMT
References: <3600e3$1kgi@campus.mty.itesm.mx> <780436939snz@oxfordll.demon.co.uk> <grace.781063817@numbat> <36ppi8$20e@scapa.cs.ualberta.ca>
Nntp-Posting-Host: hotellng.unx.sas.com
Organization: SAS Institute Inc.
Keywords: backpropagation
Lines: 51


In article <36ppi8$20e@scapa.cs.ualberta.ca>, arms@cs.ualberta.ca (Bill Armstrong) writes:
|> ...
|> Suppose a weight w has gone to zero and a subtree S which inputs to
|> it always has a zero output because of zero weights at various places
|> in it. (It is sufficient to consider a node of fanout 1 and a subtree
|> form of subnet to get the idea.) Then the weight w will never change
|> by the backprop algorithm (because w's change is proportional to S's
|> output), nor will the subtree ever change (because the backpropagated
|> error will be 0 because of w=0)).

If that is a concern and is not addressed by random initialization for
some reason, then a method such as simulated annealing that takes random
steps would be a reasonable thing to consider. See Lester Ingber's
frequent posts on various news groups.

|> Of course, small random
|> perturbations get rid of the exact zeros, but the subtree, which even
|> after the perturbation makes a nearly zero contribution, will be very
|> handicapped in trying to ever make any useful contribution. ...

That is certainly a serious problem in algorithms that make the step
length proportional to the gradient. And that in turn is why numerical
analysts ignore such methods. To compute a sensible step size, you
must do a line search or use second-order information. In the NN
literature, Quickprop and RPROP are among the few methods that compute
sensible step sizes. See:

   Fahlman, S.E. (1989), "Faster-Learning Variations on
   Back-Propagation: An Empirical Study", in Touretzky, D., Hinton, G, and
   Sejnowski, T., eds., _Proceedings of the 1988 Connectionist Models
   Summer School_, Morgan Kaufmann, 38-51.

   Riedmiller, M. and Braun, H. (1993), "A Direct Adaptive Method for
   Faster Backpropagation Learning: The RPROP Algorithm", Proceedings
   of the IEEE International Conference on Neural Networks 1993, San
   Francisco: IEEE.

|> Now that is initially.  Suppose we have the situation w = 0 , S=0 (on
|> the training set) by accident *during training*. Then the subtree
|> stays useless, even though it might contribute if it had the chance
|> to.

That seems too unlikely to worry about unless you are using very
low-precision arithmetic.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
