Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!gatech!howland.reston.ans.net!news.sprintlink.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: introduction of random noise - refs?
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <D7x6uu.Hpr@unx.sas.com>
Date: Mon, 1 May 1995 22:08:54 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <3matvl$d76@wabe.csir.co.za> <1995Apr28.194319.7366@cm.cf.ac.uk> <D7t2uM.Dw9@unx.sas.com> <3o0bki$pvi@scapa.cs.ualberta.ca>
Organization: SAS Institute Inc.
Keywords: Neural networks, noise injection
Lines: 59


In article <3o0bki$pvi@scapa.cs.ualberta.ca>, arms@cs.ualberta.ca (Bill Armstrong) writes:
|> ...
|> Here is the way I look at adding jitter: If you have a certain amount
|> of training data, and that is not enough to get a good result from
|> your NN, then you might suppose that the problem is in your net -- ie
|> that it can't generalize well without a denser set of training points.
|> So you generate more training points by using a
|>
|>  *RESAMPLING*

Note that "resampling" means something rather different to statisticians,
i.e. bootstrapping and so forth. 

|> technique(as in image processing), whereby you pick a random point in the
|> domain of the NN's function and you define its output value by taking
|> a weighted average of all training points. This could be a weighting
|> function that has support (ie is non-zero) in a neighborhood that is
|> just large enough to capture a few training points no matter where it
|> is centered in the (bounded) input space.  In other words, you are
|> smoothing your training set using a convolution with a smooth kernel,
|> and then you are sampling that function. (In image processing one
|> can use a piecewise cubic to approximate the kernel one *should* use
|> according the Shannon's sampling theorem for band-limited functions.)

Bill has saved me the bother of explaining how jittering is related
to kernel regression.

|> It seems that the resampled function would have noise removed (from
|> the output Y values) by the smoothing and would in addition have the
|> possibility of being used for generating many more training points
|> than were in the original sample. This would make it easier for the
|> (less- well-understood, and perhaps deficient) generalization of the
|> NN to operate.

The noise in the targets would be reduced by the smoothing but not
removed entirely. And the noise would not be reduced any more than
it would be through the usual process of training the NN--no free lunch.

|> Of course, with dense enough sampling, the NN has no need to do
|> any of the generalization at all.  It is all done by resampling.
|>
|> The jitter method, though it uses just one training point at a time,
|> is close to this, but depends on the NN to achieve the effect of the
|> weighted combinations of resampling by its least-squares fitting and
|> generalization procedure.

Jittering is just a Monte-Carlo algorithm for resampling. Given
enough training, there should be no difference in the results.

|> Question: Has anyone compared the jitter technique to bone fide
|> well-understood resampling techniques?


-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
