Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!rochester!udel!news.mathworks.com!news.alpha.net!uwm.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!news.sprintlink.net!siemens!flake
From: flake@scr.siemens.com (Gary William Flake)
Subject: Re: No. of hidden neurons ...
Message-ID: <D4GJoD.43u@scr.siemens.com>
Sender: news@scr.siemens.com (NeTnEwS)
Nntp-Posting-Host: gull.scr.siemens.com
Organization: Siemens Corporate Research, Princeton NJ
References: <D44LrC.IM2@hkuxb.hku.hk> <BS3bMO-.ruckj@delphi.com> <3ifgjl$dvn@oslo.uni-paderborn.de> <3ig6ji$ado@cantaloupe.srv.cs.cmu.edu>
Date: Thu, 23 Feb 1995 14:45:48 GMT
Lines: 58

In article <3ig6ji$ado@cantaloupe.srv.cs.cmu.edu>,
Scott Fahlman <sef@CS.CMU.EDU> wrote:
>
>In article <3ifgjl$dvn@oslo.uni-paderborn.de> mac@oslo.uni-paderborn.de (Hubert Mackenberg) writes:
>
>   >A network with one hidden layer can model any continuos function. (Beale &
>   >Jackson (1990) _Neural Computing: An Inntroduction_.
>
>   All right. But you should think of this statement is theoretical. You can
>   model any continuos function with ONE hidden layer, but there is nothing
>   said about the number of neurons in this layer. Dependinng on how good
>   you want to approximate your function you may need tens, hundreds,
>   thousands, or more neurons. 
>
>This is true.
>
>   In practice you don't use one hidden layer with thousand neurons, you
>   prefer more hidden layers with hundred neurons in sum doing the same job.
>
>This is an overstatement.  There are often situations in which a
>single hidden layer would be huge, but in which you get a smaller,
>better generalizing, and faster training net with additional hidden
>layers.  On the other hand, there are some problems in which
>additional layers don't help at all, and the single hidden layer
>solution is the optimal one.  If you have to model a bunch of
>independent bumps in the space defined by the raw inputs, higher-order
>features won't help.
>
>-- Scott

Moreover, Andrew Barron showed that for a large class of functions you
can expect to bound error by O(1/n), where n is the number of hidden
units in the single hidden layer.  Note that this bound says that
there exist a set of weights which satisfies it, not that BP or any
other learning algorithm will find it.

Darken, Sontag, and Gurvits have a similar result but for a larger
(different?) class of functions.  I don't have the full citation but I
know that it has appeared as a Siemens Corporate Research TR.

The Barron reference is:

@ARTICLE{BAR93,
        AUTHOR = {A.R.~Barron},
        TITLE = {Universal Approximation Bounds for Superpositions of
                 a Sigmoidal Function},
        JOURNAL = {IEEE Transactions on Information Theory},
        YEAR = {1993},
        VOLUME = {39},
        NUMBER = {3},
        PAGES = {930--945},
        MONTH = {May}
}

-- Gary
-- 
Gary W. Flake,  flake@scr.siemens.com,  Phone: 609-734-3676,  Fax: 609-734-6565
Siemens Corporate Research,  755 College Road East,  Princeton, NJ  08540,  USA
