Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!uunet!wang!news
From: peter@swamp.indigo.co.il (Peter Gordon)
Subject: Gradient Techniques
Organization: Indigo Ltd
Date: Sun, 5 Feb 1995 10:43:28 GMT
Message-ID: <PETER.95Feb5124328@swamp.indigo.co.il>
Sender: news@wang.com
Lines: 53


I have read the excellent article written by
Martin Riedmiller (riedml@ira.uka.de) and he says that there 
are problems with gradient descent.

He gives the following formula

       deltaW = -eps * E'


where deltaW is the change to the weight
      eps is the learning factor
      
  and E' is the partial derivative of E wrt w.

In areas of high gradient E' will be high, giving a high deltaW.
In areas of low  gradient E' will be low, giving a low deltaW.

It seems to me that in relatively flat areas deltaW
should be large and in areas of steep descent 
deltaW should be relatively low.

The conclusion to be drawn is that it might be
better to have 

      deltaW = -eps / E', with some numeric limit on E'

In areas that are shallow E' is small, so deltaW is large.
In areas that are steep E' is large, so deltaW is small.

Now I haven't tested it, but it would seem to give more intuitive
answers.

Would anyone like to comment?

I am also concerned about the whole idea of trying to find a global minima
for the energy. It may be that there is a global minima that is very steep, 
and  local minima which are not as steep but much wider. The steep global 
minima is probably close to "very particular expert knowledge". If different
information is presented that is a bit different from  the learned vectors, it may not 
give a good answer. On the other hand,  local minima which are not as deep as
the global minima may provide a "wider knowledge base" possibly giving better results
for a wider class of unlearned data.

This is only speculation, and again I invite comments.

Peter






