Newsgroups: sci.math.num-analysis,comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!nntp.sei.cmu.edu!news.cis.ohio-state.edu!math.ohio-state.edu!howland.reston.ans.net!newsfeed.internetmci.com!in2.uu.net!news.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Atan activation function (Re: the error function)
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <Dq4xwt.9Dr@unx.sas.com>
Date: Sat, 20 Apr 1996 00:41:17 GMT
Distribution: inet
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <315D8EB6.5C5B@wombat.eng.fsu.edu> <4jngp5$t29@st-james.comp.vuw.ac.nz> <4k5ae3$7vd@reader2.ix.netcom.com> <3167C7D8.41C67EA6@stats.ox.ac.uk> <316A1048.2DD7@speech.kth.se> <4khveg$322@dfw-ixnews7.ix.netcom.com>
Organization: SAS Institute Inc.
Lines: 36
Xref: glinda.oz.cs.cmu.edu sci.math.num-analysis:27880 comp.ai.neural-nets:31179


In article <4khveg$322@dfw-ixnews7.ix.netcom.com>, jdadson@ix.netcom.com(Jive Dadson ) writes:
|> ...
|> So... I used a real small weight decay penalty, 1e-9 if I remember
|> correctly, and tested both atan and tanh sigmoids regressing onto
|> one cycle of a sine wave using 100 evenly spaced samples. I used
|> three hidden neurons and linear output activation. The backpropogation
|> was done using the Broyden-Fletcher-Goldfarb-Shanno variation of
|> the Davidon-Fletcher-Powell method from Numerical Recipes for C,
|> (dfpmin). (I have hacked the "lnsrch" routine to make it robust.
|> My experience has been that as written, it fails far too often to be of
|> any use for training an NN.)
|> 
|> The results were a little surprising. This problem has two local
|> minima, where the NN boots it, and settles for approximating only a
|> half cycle of the sine wave accurately. It usually finds one of the
|> locals. The atan version did not get stuck in a local minimum
|> as often as the tanh version, using 48 tries to get 20 global
|> minimum results, whereas the tanh version took 56 tries. However,
|> the atan version converged more slowly, and as a result was
|> about a third more expensive, net, in terms of objective-function and
|> gradient calculations.
|> 
|> Anyone care to speculate on why this might be? (I have a conjecture.)

See  Owen, A.B. (1994) "Overfitting in neural networks" in Sall, J. and
Lehman, A. (eds.) Computing Science and Statistics, 26, 57-62. This
publication is also known as the "Proceedings of the 26th Symposium
on the Interface".


-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
