Newsgroups: sci.logic,sci.stat.math,comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!nntp.club.cc.cmu.edu!miner.usbm.gov!news.er.usgs.gov!stc06.ctd.ornl.gov!news.he.net!www.nntp.primenet.com!nntp.primenet.com!cs.utexas.edu!howland.erols.net!EU.net!usenet2.news.uk.psi.net!uknet!usenet1.news.uk.psi.net!uknet!uknet!lyra.csx.cam.ac.uk!warwick!news.nott.ac.uk!nott-cs!ebx
From: ebx@cs.nott.ac.uk (Edward A G Burcher)
Subject: Output unit scaling ?
Message-ID: <E1sDx4.CsE@cs.nott.ac.uk>
Keywords: Neural Networks Output Units
Sender: Ed Burcher<ebx@cs,nott.ac.uk>
Organization: Nottingham University
References: <32994424.136C@postoffice.worldnet.att.net> <57bt40$brh@gap.cco.caltech.edu> <32A0FD04.F96@postoffice.worldnet.att.net>
Date: Mon, 2 Dec 1996 12:41:27 GMT
Lines: 40
Xref: glinda.oz.cs.cmu.edu sci.logic:21119 sci.stat.math:13551 comp.ai.neural-nets:34906

Hi, I hope someone out there can help with this. I'm fairly new to 
neural nets and so I guesss this could be one of those obvious-once-you-
know questions, but nevertheless....

I am trying to build a 7-10-10-3 feedforward neural net, with full
connectivity between successive layers but no (direct) connectivity between
non-adjacent layers. I am currently using the standard sigmoid function
as my activation function. The problem is that I have training and test data
where all seven inputs typically vary in a small range 0-0.3 . I am quite happy
to normalise this data; However, my 3 output units are tricky to deal with.
One of them varies in the range 200-20000, the second from 100-300 and the
third 20-70. Clearly, with such large numbers, the network will find it 
difficult (impossible?) to be trained on such data, as my activation function
only gives output in the [0,1] range. I have heard it is possible to adapt the
sigmoid function to give a nonlinear activation function with a larger range.
How is this done exactly, and would it be a suitable technique for solving the
problem ?

The network is being used to infer these values from measurements, so I need
to preserve the original values. One possiblity I have considered is to apply
a function to each of the outputs to scale it into the [0,1] interval, and
to apply the inverse of that function when I require the values back again.
Is this a safe approach, if I have to apply a different scaling function to
each output unit ? 

I was thinking of something simple as a scaling function such as

(unit value - min value the unit can take ) / ( max value that unit can take - min value )

Is this suitable (assuming all of this is a valid approach) or are there better
ways to scale ?

Thanks for any help that anyone can offer.



Ed Burcher 
Joint Honours Maths / Computer Science Year 3 
University of Nottingham
England
