Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news4.ner.bbnplanet.net!news.ner.bbnplanet.net!news.mathworks.com!newsfeed.internetmci.com!news.sprintlink.net!barney.gvi.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Integrity of Neural Networks
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DKC06J.HCL@unx.sas.com>
Date: Fri, 29 Dec 1995 04:51:55 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <4broph$jd9@guardian.j-sainsbury.co.uk> <4bv0rm$19s@scapa.cs.ualberta.ca>
Organization: SAS Institute Inc.
Lines: 83


In article <4bv0rm$19s@scapa.cs.ualberta.ca>, arms@cs.ualberta.ca (Bill Armstrong) writes:
|> ...
|> What a trained neural net actually does is an extremely important
|> question, and must be answered satisfactorily before one can use
|> neural nets in safety-critical applications.  Several years ago, I put
|> something on the net which showed a possible pitfall of using a
|> multilayer perceptron to learn a function.  The idea was that even
|> though the function fitted the training data perfectly, and a large
|> number of test points also showed no problems, there could be an input
|> for which the network output differed vastly from what was expected --
|> ie there was a sharp spike in the output.  The example was for one
|> dimensional input, but the idea applies to any dimension, and is
|> extremely important for high dimensions where it is impossible to
|> check network behavior exhaustively by test points.  To create an
|> example for a single variable x, just take two sigmoidal elements and
|> add their outputs in an output element.  One sigmoid jumps way up and
|> the other one jumps way down -- but just a little bit later.  Hence
|> the sum is pretty flat except for a very large peak.  The effect of
|> training points off the peak can cause the optimal least-squares fit
|> to have a large unwanted peak.
|>
|> After a debate with Scott Fahlmann, the state of affairs seemed to be
|> that, yes, this phenomenon can occur, and that maybe it can be made
|> less likely if one uses some form of limitation on the magnitude
|> of weights which would prevent the large jumps up and down.

Yes, this is an important issue that has received little attention.  I
encountered these spikes frequently while running simulations for Sarle
(1995; see below for ftp adddress). There are some modest spikes shown
in the plots for the "wiggle" functions using Schwarz's Bayesian
criterion to choose the number of hidden units, and some severe spikes
in the corresponding plot for the "saw" function. So just having a good
estimate of the number of hidden units is not sufficient to avoid
spikes, although it certainly helps.

It is true that restricting the size of the weights helps to avoid
spikes. Obviously, if you enforce simple bounds on the absolute weights,
you can derive corresponding bounds on the network outputs, but you are
left with the question of what bounds will give you the best
generalization?

Stopped training dramatically reduces the chance of getting serious
spikes, but you can still get spikes if the validation set is too small
or ill chosen. In the simulations reported in Sarle (1995), there was
only one severe spike with stopped training. It happened with the "line"
function, 5 hidden units, and 33% validation data--a seemingly innocuous
situation.

Bayesian estimation seems to be more effective than stopped training at
avoiding spikes based on my simulations. I have never seen any severe
spikes when I was using even remotely reasonable prior distributions.
The worst Bayesian spike in Sarle (1995) is for the "step" function.

|> I feel today that ALNs as in Atree 3.0 are the only form of neural net
|> which will allow proofs of conformity to a specification (for a large
|> networks -- solutions to toy problems are not at issue here.)

What sort of specifications do you mean?

Kernel regression (aka GRNN) with a nonnegative kernel guarantees that
the output is never outside the range of the data.

The following files are available by anonymous ftp from ftp.sas.com
(Internet gateway IP 192.35.83.8) in the directory /pub/neural :

 interface95.ps Sarle, W.S. (1995), "Stopped Training and Other
 3926K          Remedies for Overfitting," to appear in Proceedings of
                the 27th Symposium on the Interface. (Postscript file,
                10 pages)

                Compressed versions of the above paper and the
                corresponding command to decompress the file
                (NOTE: Compressed files must be ftped in binary mode):
 inter95.ps.gz  (615K) gunzip inter95.ps.gz
 inter95.ps.Z   (747K) uncompress inter95.ps.Z
 inter95.zip    (614K) unzip inter95.zip

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
