Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!news.sprintlink.net!redstone.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: [Q] Normality of Training Data
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <D67Hz9.1Ax@unx.sas.com>
Date: Wed, 29 Mar 1995 14:37:57 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References:  <3latp6$9sn@wraith.cs.uow.edu.au>
Organization: SAS Institute Inc.
Keywords: Normalisation, Training
Lines: 50


In article <3latp6$9sn@wraith.cs.uow.edu.au>, neilh@wraith.cs.uow.edu.au (Neil Harper) writes:
|>
|> I am using a NN to classify objects using CTFM Ultrasonic data. I noticed
|> that lots of NN papers normalise the input data so that the training
|> and testing data does not form a "bell" shape. I have tried that with
|> my data and get significantly better results but I dont know why.

Here we have another example of the problems created by vague NN
terminology. When Neil says "input data", does he mean "net inputs"
as opposed to outputs and targets, or does he mean "the data that I
input to the net" including targets?

As for net inputs, normality is irrelevant. In linear models, all that
counts about the inputs is that they be linearly related to the targets.
For nonlinear models like feedforward neural nets, all that counts is
that the inputs be related to the targets in some way the net can learn,
but I don't know any nice, simple characterization of that property.

Another terminology problem: "normalise" does not mean "transform to
have a normal distribution", but "adjust to have some specified norm",
where "norm" refers to various measures of size or dispersion of a
vector. It is often useful to normalize each net input (not each case)
to a range of [-1,1], or to have a mean of 0 and a standard deviation of
1, etc. This is just a matter of computational convenience and
efficiency. Normalizing the targets is far more important. If the output
activation function has a limited range, obviously the target values
need to be pretty close to that range. Masters (see below) discusses
some other issues related to normalising targets, and there is a
discussion in the documentation for my TNN macro available by anonymous
ftp from ftp.sas.com in the directory /pub/sugi19/neural in the file
tnn2.doc.

|> Masters, Tim, 1993, Practical Neural Network Recipes in C++ notes
|> that "In fact there is some evidence that flat distributions are
|> learned most easily" on page 267.

More vague terminology (I am not trying to pick on Neil; Masters isn't
always clear, either): distributions of what? Least squares training
works best when the noise distribution is normal. But if the noise
distribution is normal, it doesn't follow that the target distribution
is normal; the target distribution might well be almost flat. Other
training criteria work best for other noise distributions.


-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
