Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!rochester!udel!news.sprintlink.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: On computing # nodes in hidden layer...
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <D4vq06.BJq@unx.sas.com>
Date: Fri, 3 Mar 1995 19:26:30 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <3iir9f$j3m@wizard.uark.edu> <1995Mar1.180342.5060@investor.pgh.pa.us>
Organization: SAS Institute Inc.
Keywords: ANN Hidden layer
Lines: 50


In article <1995Mar1.180342.5060@investor.pgh.pa.us>, rbp@investor.pgh.pa.us (Bob Peirce #305) writes:
|> >>...
|> >>Input layer nodes = M
|> >>Hidden layer nodes = N := Sqrt (M*P)
|> >>Output layer nodes = P
|> ...
|> I asked a question like this a while back and was told there is no
|> widely accepted theory on the ideal number of neurons to have in one or
|> more hidden layers, except most folks seem to think going much beyond
|> N = M risks memorizing instead of learning.

There is no problem with having more hidden nodes than inputs. You can
get memorization when the number of weights is >= the number of
training cases.

|> I am a novice, and nobody seems to have strong theories on why or
|> why not to do any of these things, so I just grope along and hope.

You want theories? Here's a good paper from the NN literature:

   Moody, J.E. (1992), "The Effective Number of Parameters: An Analysis
   of Generalization and Regularization in Nonlinear Learning Systems",
   NIPS 4, 847-854.

And some statistical references:

  Akaike, H. (1974), "A new look at the statistical model identification
  IEEE Trans. Automatic Control AC-19, 716-723.

  Judge, G.G., Griffiths, W.E., Hill, R.C., Lutkepohl, H. and Lee, T.
  (1985), The Theory and Practice of Econometrics, 2nd ed., New York:
  John Wiley & Sons.

  Miller, A.J. (1990), Subset Selection in Regression, Chapman & Hall.

  Sawa, T. (1978), "Information Criteria for Discriminating Among
  Alternative Regression Models," Econometrica, 46, 1273-1282.

  Schwarz, G. (1978), "Estimating the Dimension of a Model," Annals of
  Statistics, 6, 461-464.

  Wahba, G. (1990), Spline Models for Observational Data, Philadelphia:
  SIAM.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
