Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!rochester!udel!gatech!howland.reston.ans.net!news.sprintlink.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: linear separable boolean functions -- lists?
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <D7BE5E.DDz@unx.sas.com>
Date: Thu, 20 Apr 1995 03:39:14 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <3makuv$jng@agate.berkeley.edu> <797756360snz@longley.demon.co.uk> <D73L99.EK9@unx.sas.com> <798218862snz@longley.demon.co.uk>
Organization: SAS Institute Inc.
Lines: 87


In article <798218862snz@longley.demon.co.uk>, David@longley.demon.co.uk (David Longley) writes:
|> In article <D73L99.EK9@unx.sas.com>
|>            saswss@hotellng.unx.sas.com "Warren Sarle" writes:
|> >
|> > ... Polynomial models
|> > with interactions (i.e. products of inputs and powers thereof) are
|> > universal approximators, as are multilayer perceptrons. Polynomial
|> > models are easier to train, being linear in the weights, but the number
|> > of weights increases exponentially with the number of inputs.  Thus,
|> > MLPs tend to be more convenient and flexible when you have many inputs,
|> > especially when some of the inputs are not really useful predictors.
|>
|> Thank you. I would welcome further elaboration on this. As written,
|> the parallel seems to be hierarchical loglinear modelling.

If you mean the closest parallel in the statistical literature to an
MLP, I would say projection pursuit regression, which is an MLP with one
hidden layer in which the hidden-unit activation functions are
nonparametric. See Friedman, J.H. and Stuetzle, W. (1981) "Projection
pursuit regression," J. of the American Statistical Association, 76,
817-823.

|> I had thought that the closest parallel was just logistic regression.

That is indeed an MLP with no hidden layer.

|> However, isn't there something to be
|> said for the *statistical* basis of statistical models whilst neural nets just
|> fit function approximators and are therefore prone to over-fit?

The issue of overfitting is essentially the same for both statistical
and neural models because the models are essentially the same. You can
overfit in linear regression by using too many predictors, in polynomial
regression by using too many terms, in autoregressive models by using
too many lags, in kernel regression by using too small a bandwidth, or
in an MLP by using too many hidden units. Many of the same methods are
used to deal with overfitting in both the statistical and neural net
literature, such as weight decay (aka ridge regression) and pruning (aka
pre-test estimation or stepwise regression), although to my knowledge,
stopped training appears only in the neural net literature and Stein
estimators appear only in the statistical literature.

|> Isn't building
|> a neural net model like doing a 'direct' method in regression with all ones
|> main variables and evry combination possible thrown in on top.

To some extent, yes, but an MLP gives you more flexible control over the
complexity of the model than you have in polynomial regression with
interactions (which I think is what you mean by a 'direct' method;
correct me if I'm wrong).

|> At least in a stepwise regression variables which meet some 
|> statistical level of significance are entered.

There are pruning methods in the neural net literature that are closely
related to stepwise regression, such "Optimal Brain Surgeon" in Hassibi,
B. & Stork, D.G. (1993) "Second order derivatives for network pruning:
Optimal Brain Surgeon" in Hanson, S.J., Cowan, J.d.  & Giles, C.L.,
eds., _Advances in Neural Information Processing Systems 5_, 164-171,
Morgan-Kaufmann: San Mateo, CA. However, such approaches have not been
found to be very useful in the statistical literature, even though
stepwise regression is very popular among non-statisticians. See 
Miller, A.J. (1990), _Subset Selection in Regression_, Chapman & Hall,
or mention stepwise regression on sci.stat.consult and see what Frank
Harrell says. :-)

|> I'm very interested in the parallels between classical statistics and neural
|> network modelling, so any light you can throw, directions you can point me
|> in, I'd be very grateful.

Try the following article available by anonymous ftp from ftp.sas.com
(Internet gateway IP 192.35.83.8) in the directory /pub/sugi19/neural,
file name neural1.ps:

   Sarle, W.S. (1994), "Neural Networks and Statistical Models,"
   Proceedings of the Nineteenth Annual SAS Users Group International
   Conference, Cary, NC: SAS Institute, pp 1538-1550. (Postscript file)

Additional references on the relationship between statistics and
neural nets are in the comp.ai.neural-nets FAQ.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
