Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!news.duke.edu!concert!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: He who knows what he does not know is wise
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <Cz48IG.JIJ@unx.sas.com>
Date: Fri, 11 Nov 1994 18:25:28 GMT
Distribution: usa
References: <parkCyxFB0.5Ko@netcom.com> <Cyz0MF.Jx3@unx.sas.com> <x05XzZ-.predictor@delphi.com> <Cz0yJq.GFJ@unx.sas.com> <GJOHN.94Nov10192217@elaine43.Stanford.EDU>
Nntp-Posting-Host: hotellng.unx.sas.com
Organization: SAS Institute Inc.
Lines: 67


In article <GJOHN.94Nov10192217@elaine43.Stanford.EDU>,
gjohn@elaine43.Stanford.EDU (George John) writes:
|>
|> The question was how to get a good measure of a network's confidence
|> in its output.

Not quite: in article <parkCyxFB0.5Ko@netcom.com>, park@netcom.com 
(Bill Park) wrote:
|> What are some good ways to get a neural network to report that the inputs
|> you gave it are too different from its training set to permit it to
|> give you an accurate answer?

If a pattern is far from all the training and validation cases, you
can't have _any_ confidence in the network's output.

But back to the new question of confidence intervals for the net
output. George John continued:
|> I'd like to point out a couple of relevant papers by Andreas Weigend,
|> who takes two approaches to putting error bars on the output of a
|> neural net.
|>
|> APPROACH 1) Learn many neural nets.  Use all of them to predict the
|> output y for a new input x.  You can view the many different predicted
|> y's as samples from a distribution.  If they all predict roughly the
|> same value of y, then you can be fairly confident in the predicted y.
|> If they're all in disagreement, you should have little confidence.
|> The many nets are trained using early stopping, and each net uses a
|> different holdout set so that the nets do tend to learn different
|> functions.

This approach is flat-out wrong and is a prime example of the mistakes
people are likely to make when they are ignorant of statistics. The
method above takes into account the variability due to choosing the
network architecture and choosing initial values, but it ignores
variability due to choice of training cases and to noise in the targets.

|> APPROACH 2) Learn a single net, but teach it to know when it's wrong.
|> They do this by first training a network to predict the output y
|> given the inputs x.  Then they get the squared error for each
|> training pattern, and train another output unit to match the
|> squared error.  (Note: since the new unit is matching the squared
|> error and not the error, its job is easier.  It only tells you
|> how far off the current estimate probably is, but not in which
|> direction.)

This approach is a slight improvement over the preceding method. It does
take into account noise in the targets. It would work if you never
wanted to make predictions for inputs that did not appear in the
training set and if you had a huge amount of data. But it does not take
into account variability due to choice of training cases and completely
ignores the problem that prompted the original question from Bill Park.

Confidence intervals and prediction intervals are among the things
that statisticians have studied intensely. If you train a net to
convergence and have substantially more training cases than weights,
then the usual asymptotic theory for nonlinear models applies. The
problem is how much more is "substantially more"? For funkier
situations such as stopped training, you can use bootstrapping if
you're not in a hurry.


-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
