Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!rochester!cornellcs!travelers.mail.cornell.edu!news.kei.com!simtel!news.sprintlink.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Thoughts on noisy data/rules of thumb
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DF318p.Fn0@unx.sas.com>
Date: Mon, 18 Sep 1995 03:48:25 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References:  <f90pp-1709951457250001@perper.dial.matfys.lth.se>
Organization: SAS Institute Inc.
Lines: 62


In article <f90pp-1709951457250001@perper.dial.matfys.lth.se>, f90pp@matfys.lth.se (Per Persson) writes:
|> ...
|> In real life however, you are always dealing with noisy data-sets and
|> an input signal that is i'(n)=i(n)+G(0,sigma), where G represent some
|> noise with zero mean and a variance sigma. The output from the net then
|> becomes o'=o(i(1)+G(0,sigma)...i(n)+G(0,sigma)). The target of course has
|> the same behaviour and t'=t+G(0,sigma). Then the error becomes
|> e'=(t'-o').

That is called an "errors-in-variables" or "measurement error" model.
See Fuller, W.A. (1987), Measurement Error Models, New York: John Wiley
& Sons, for methods to estimate the noise-free relationship.

If, as is usually the case, you are going to want to make predictions
from noisy inputs, as well as train on noisy inputs, then the input
noise is ignorable, and you need not worry about errors-in-(input)variables.
You can simply apply the usual statistical regression models.

|> To me this indicates that knowlege of the function G could provide
|> "rules of thumb" to what kind of error-function to choose and also when
|> to stop training a net.

Knowledge of G does not simply provide rules of thumb, it provides
specific estimators with various optimality properties, such as
maximum likelihood and Bayesian estimators. This is what statistics
is about (rather than baseball scores). See, for example:

   Cramer, J. S. (1986) Econometric Applications of Maximum Likelihood
   Methods, Cambridge University Press: Cambridge.

   Edwards, A.W.F (1972) Likelihood, Cambridge Univ Press: Cambridge.

   Gallant, A.R. (1987) Nonlinear Statistical Models, Wiley: NY.

   McCullagh, P. and Nelder, J.A. (1989) Generalized Linear Models,
   2nd ed., Chapman & Hall: London.

   Ross, G.J.S. (1990) Nonlinear Estimation, Springer-Verlag: NY.

One can also devise methods that are efficient for a wide class of
noise distributions. See, for example:

   Hoaglin, D.C., Mosteller, F., and Tukey, J.W., eds. (1983),
   Understanding Robust and Exploratory Data Analysis, NY: Wiley.

   Peter J. Huber (1981), Robust Statistics, NY: Wiley.


|> 1) the update function should take the shape of the noise into account,
|> not updating if the error falls within bounds given (indirectly) by G.
|>
|> 2) training should stop when performance falls within bounds given by G

Here, I'm afraid, your intuition has led you seriously astray. To see
why, take an introductory class in mathematical statistics.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
