Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!bb3.andrew.cmu.edu!newsfeed.pitt.edu!gatech!news.mathworks.com!nntp.primenet.com!newspump.sol.net!spool.mu.edu!agate!newsgate.duke.edu!interpath!news.interpath.net!sas!newshost.unx.sas.com!saswss
From: saswss@hotellng.unx.sas.com (Warren Sarle)
Subject: Re: Apply sigmoid activation fun
Originator: saswss@hotellng.unx.sas.com
Sender: news@unx.sas.com (Noter of Newsworthy Events)
Message-ID: <DwKIz0.KnJ@unx.sas.com>
Date: Fri, 23 Aug 1996 02:01:48 GMT
X-Nntp-Posting-Host: hotellng.unx.sas.com
References: <4unt8m$31m@hecate.umd.edu> <Pine.SOL.3.91.960815170437.24262A-100000@suma3.reading.ac.uk> <4vcmgv$9r2@hecate.umd.edu> <DwI9wL.73I@unx.sas.com> <4vget1$297@delphi.cs.ucla.edu>
Organization: SAS Institute Inc.
Lines: 54


In article <4vget1$297@delphi.cs.ucla.edu>, edwin@cs.ucla.edu (E. Robert Tisdale) writes:
|> saswss@hotellng.unx.sas.com (Warren Sarle) writes:
|> 
|> >In article <4vcmgv$9r2@hecate.umd.edu>,
|> coral@csc.umd.edu (Jian-Zheng Zhou) writes:
|> >|> ... 
|> >|> My main question is that how do you test a NN after training. 
|> >|> How can we compare the real output values which can be ranged beyond 0 to 1
|> >|> with the calculated values which are only between 0 and 1 due to sigmoid 
|> >|> transformation.
|> 
|> >No, no, no. You do NOT use an output activation function
|> >with a range of (0,1) if the correct outputs can be outside that range.
|> >There are three things you can do:
|> 
|> > 1. Use an identity ("linear") output activation function
|> >    (as Jeff Hannan said) or some other unbounded activation function.
|> > 2. If you know upper and lower bounds for the correct outputs,
|> >    scale the target values as (Target-Lowerbound)/(Upperbound-Lowerbound)
|> >    so the correct outputs will be in [0,1].
|> > 3. If you know upper and lower bounds for the correct outputs,
|> >    scale the result of the output activation function as
|> >    Activation*(Upperbound-Lowerbound)+Lowerbound so the range of
|> >    the network output will include the correct outputs.
|> 
|> Only option #1 is valid.  Sigmoidal activation functions should never
|> be applied to outputs unless they are supposed to be binary -- in {0, 1}.

Oh? If my target values are probabilities or proportions or other
continuous values in [0,1], why shouldn't I use a sigmoidal output
activation function?

|> The only valid reason for scaling outputs is to [effectively] adjust
|> the "importance" of each output relative to the others.  If the objective
|> of the learning algorithm is to minimize the total mean squared error
|> of the outputs (standard backprop), option #2 effectively gives each
|> output the same importance and option #3 effectively gives each output
|> an importance proportional to the square of its dynamic range.

That is certainly an important consideration if you have more than one
output, but I hardly think it's the _only_ valid reason for scaling 
outputs. If you have only one output, then relative importance is
irrelevant. And note that I was talking about upper and lower bounds,
not maximum and minimum values, for the correct outputs.

These issues are discussed in the FAQ under "Why use activation
functions?" and "Should I normalize/standardize/rescale the data?" in
ftp://ftp.sas.com/pub/neural/FAQ2.html.
-- 

Warren S. Sarle       SAS Institute Inc.   The opinions expressed here
saswss@unx.sas.com    SAS Campus Drive     are mine and not necessarily
(919) 677-8000        Cary, NC 27513, USA  those of SAS Institute.
