Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!rochester!udel-eecis!gatech!csulb.edu!hammer.uoregon.edu!hunter.premier.net!news.sprintlink.net!news-peer.sprintlink.net!news-pull.sprintlink.net!news.sprintlink.net!news-pen-16.sprintlink.net!interpath!news.interpath.net!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Why more than one hidden layer ?
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <E44L2u.9Bt@unx.sas.com>
Date: Thu, 16 Jan 1997 23:54:30 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <5bhqge$evf$1@mark.ucdavis.edu> <5big06$5s2@news.ox.ac.uk> <5bilev$hts@camel2.mindspring.com> <5biq07$b9o@news.ox.ac.uk>
Organization: SAS Institute Inc.
Lines: 53


In article <5biq07$b9o@news.ox.ac.uk>, patrick@gryphon.psych.ox.ac.uk (Patrick Juola) writes:
|> In article <5bilev$hts@camel2.mindspring.com> EricG <no.junk.em@il.thanks> writes:
|> >
|> >>>  0 hidden layers - linear functions
|> >>>  1 hidden layer - nonlinear continuous functions
|> >>>  2 hidden layers - nonlinear discontinuous functions
|> >>>  3+ hidden layers offer no advantages
|> >>
|> >>It's not true as stated.
|> >>
|> >>The actual hierarchy is
|> >>   0 hidden layers -- linearly separable functions
|> >>   1 hidden layer -- continuous functions
|> >>   2+ hidden layers -- no advantages over 1 layer
|> >
|> >From experience, I'd have to side with the first example above.  This is 
|> >also basically what Tim Masters says in Practical NN Recipes.
|> 
|> I seriously doubt it, bluntly.  I think you either misinterpreted
|> Mr. Masters or he made a tremendous blunder.  Think of it this way --
|> a 2 hidden layer neural network is simply the composition of a 1 hidden
|> layer network with a 0 hidden layer network -- there's no way that
|> two continuous functions can be composed to put a discontinuous
|> one (elementary calculus).  There is simply no way that a neural network
|> can learn (or produce) a discontinuous function without infinite weights.

This discussion would be more constructive if people would specify
whether they are talking about continuous activation functions or
step functions. I can't find a copy of Sontag (1992) at the moment,
but he was using step functions, and showed that two hidden layers
were required for uniform (?) approximation of certain types of
discontinuous functions.

   Sontag, E.D. (1992), "Feedback stabilization using two-hidden-layer 
   nets", IEEE Transactions on Neural Networks, 3, 981-990.

|> It's been widely accepted since 1986 that a single layer suffices;

A single layer of _what_ suffices for _what_? Here's a fairly recent
reference for anyone who wants details:

   Hornik, K. (1993), "Some new results on neural network 
   approximation," Neural Networks, 6, 1069-1072.


-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
 *** Do not send me unsolicited commercial or political email! ***

