Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!gatech!newsfeed.internetmci.com!in2.uu.net!news.interpath.net!sas!mozart.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Transformation/Scaling of Inputs
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DoCKBw.6rt@unx.sas.com>
Date: Sat, 16 Mar 1996 06:21:32 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References:  <4id3nc$nke@newsbf02.news.aol.com>
Organization: SAS Institute Inc.
Lines: 52


In article <4id3nc$nke@newsbf02.news.aol.com>, rhughesmd@aol.com (RHughesMD) writes:
|> I am working my way through a neural network package, and would like some
|> input on the best way to transform or scale input data.  For example, if I
|> have variable X, and I know the following information:
|> 
|>            mean x  =  119.94
|>            stddev x = 301.74
|>            min x    =  0
|>            Q1 x    = 16.45
|>            mid x   = 40.64
|>            Q3 x   =  108.23
|>            max x   = 27840
|> 
|> What is the best way to scale this input to a range of 0 - 1, and still
|> represent the data accurately to the neural network ?

Rats! Here I've just added an answer in the FAQ to "Should I 
normalize/standardize/rescale the data?" and I forgot to address
this scaling-the-input-to-[0,1] thing. Oh, well. The following
paragraphs will appear in the FAQ on Sunday or Monday:

   There is a common misconception that the inputs to a multilayer
   perceptron must be in the interval [0,1]. There is in fact no such
   requirement, although there often are benefits to standardizing the
   inputs as discussed below. [in the FAQ, not this post]

   If your output activation function has a range of [0,1], then
   obviously you must ensure that the target values lie within that
   range. But it is generally better to choose an output activation
   function suited to the distribution of the targets. See "Why use
   activation functions?"

   When using an output activation with a range of [0,1], some people
   prefer to rescale the targets to a range of [.1,.9]. I suspect that
   the popularity of this gimmick is due to the slowness of standard
   backprop.  But using a target range of [.1,.9] for a classification
   task gives you incorrect posterior probability estimates, and is
   quite unnecessary if you use an efficient training algorithm (see
   What are conjugate gradients, Levenberg-Marquardt, etc.?)

As for Rob's specific case, I should add that X is _very_ skewed, which
is a hint that it might help to apply a nonlinear transformation (such
as some root, square or otherwise) to make the distribution of X more
symmetric. But whether such transformations are advisable depends very
much on the particular problem.

-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
