Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!news.kei.com!nntp.et.byu.edu!netline-fddi.jpl.nasa.gov!hudson.lm.com!godot.cc.duq.edu!ddsw1!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: patterns to input ratio
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <D9z0st.MFz@unx.sas.com>
Date: Sat, 10 Jun 1995 19:00:29 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References:  <3r6qhl$mm4@mirv.unsw.edu.au>
Organization: SAS Institute Inc.
Lines: 54


In article <3r6qhl$mm4@mirv.unsw.edu.au>, s9390168@acsusun (Tony Florio) writes:
|> In a recently published article:
|>
|>    Cicchetti et al, "Diagnosing Autism Using ICD-10 Criteria: A comparison
|>            of Neural Networks and Standard Multivariate Procedures" in
|>            Child Neuropsychology, 1995, 1 (1), 26-37
|>
|> The authors raise the point that empirical studies of both multivariate
|> techniques such as Linear Discriminant Analysis and Neural Networks have
|> shown that the Subject to Variable ratios (training patterns to inputs)
|> typically used in neural network studies is low (5:1) and that for valid
|> generalisation a ratio approaching 25:1 is required.

Yes and no. The relevant ratio is the number of training cases to the
number of weights, not inputs. In a linear model, if this ratio is high
(say 5:1 or 10:1), then the training error is a good estimate of the
generalization error. And for most practical applications, it is indeed
advisable to have a ratio of at least 5:1. But you can get good
generalization with a smaller ratio if the noise in the training set is
sufficiently small.

In nonlinear models such as feedforward nets or even linear logistic
regression, the theory from linear models applies only approximately,
so to be safe, all those ratios should be raised for nonlinear models.
Frank Harrell, an expert on logistic regression, recommends at least
20:1.  So 25:1 for a feedforward net with a hidden layer is probably a
lower limit.

All of the above assumes no regularization. If you use stopped training
you should have _lots_ of hidden units in the network.  It is not clear
whether it is possible to have too many hidden units. For Bayesian
estimation, it also seems to be fine to have lots of hidden units. As
for inputs, the more relevant inputs, the better, but irrelevant inputs
will degrade generalization unless you have enough data for the net to
pick out the relevant inputs.

|> Does anyone have any comment on this ? Also I thought the neural network
|> literature based on VC dimensions indicated that you need to aim for a ratio
|> of 5.2:1 patterns:weights to get good generalisation of training set accuracy.

5.2?! An amazingly precise number!

|> Does anybody know how to calculate the number of subjects (patterns) required?

You can't calculate it unless you have a lot more prior information than
most people do. The best you can do in most practical cases is to estimate
the generalization error and see if it's good enough.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
