Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!howland.reston.ans.net!EU.net!CERN.ch!news
From: Owen.RANSEN@cern.ch (Owen RANSEN)
Subject: Re: backprop agony...
X-Nntp-Posting-Host: misf120.cern.ch
Message-ID: <D5sK9q.Dsx@news.cern.ch>
Sender: news@news.cern.ch (USENET News System)
Organization: CERN - European Organization for Nuclear Research
X-Newsreader: WinVN 0.93.11
References: <3kkdmc$b4e@zebedee.ingres.co.uk>
Mime-Version: 1.0
Date: Tue, 21 Mar 1995 13:03:26 GMT
Lines: 153

In article <3kkdmc$b4e@zebedee.ingres.co.uk>, jonm@nessie.be.ingres.com 
says...
>
>All,
>
>I have been trying hopelessly to implement backprop. Despite the heaps
>of material I have, I either have a problem with my programs or a
>problem with understanding the algorithm... I'll outline the algorithm
>as I see it and if no-one corrects me, I assume my code is seriously
>in error...  I assume the problem lies in the errors from the hidden
>to the input since I can still only store linearly separable problems.
>I.e. XOR does *not* work but any three of the four work fine..
>
>I would appreciate *any* time spent on this as I have spent so much time
>on this now that I am obviously missing something very obvious...
>Cutting and pasting someone elses code is not an option as I don't 
believe
>that will lead to me understanding it properly...
>
>So... on with the algorithm............  I have used pseudo code
>where possible to simplify the reading...
>
>This is a simple fully connected three layer structure..
>
>beta --> squashing value used in sigmoid function 1/(1+exp(2*beta*x))
>gain --> term used to increase size of jumps to weight space minima.
>
>1) set weights to random values (-0.5 < Wij < 0.5)
>
>2) present all patterns to the system and store in pattern array.
>
>   for each input pattern
>
>      // Feed forward first...
>
>      for each hidden node (h)
>                  sum_hidden[h]= 0;
>                  for each input_node (i)
>                          sum_hidden[h] += Weight input_node[i] to 
hidden_node[h]
>                                                                  * input 
value[i]
>          op_from_hidden[h] = sigmoid(sum_hidden[h])
>
>      for each output node (o)
>                  sum_output[o]= 0;
>                  for each hidden_node (h)
>                          sum_output[o] += Weight hidden_node[h] to 
output_node[o]
>                                                                  * 
op_from_hidden[h]
>          op_value[o] = sigmoid(sum_hidden[o])
>
>      // Now feed error back..
>
>          // calculate error on O/P..
>
>          for each output node (o)
>              error_on_op[o] = op_value[o] * (1 - op_value[o]) *
>                                                         (expected_op[o] 
- op_value[o])
>
>      // Calculate the error on hidden
>
>          for each hidden node (h)
>                  sum_of_err[h] = 0
>                  for each output node (o)
>                          sum_of_err[h] += weight hidden_node[h] to 
output_node[o] *
>                                                                
error_on_op[o]  
>
>                  error_on_hidden[h] = 2 * beta * op_from_hidden[h] *
>                                                                   (1 - 
op_from_hidden[h]) * sum_of_err[h]
>                
>        // Adjust all weights...
>
>                for each output unit (o)
>                        for each hidden unit (h)
>                                weight hidden_node[o] to output_node[o] 
+= gain *
>                                                                   
error_on_op[o] * op_from_hidden[h]
>
>                for each input unit (i)
>                        for each hidden unit (h)
>                                weight input_node[i] to hidden_node[h] += 
gain *
>                                                                   
error_from_hidden[h] * input_pattern[i]
>
>
>    add 1 to epoch count... (ie we have presented each pattern once)
>
>        // calculate error...
>
>        wt_sqrd = 0
>        for each input pattern
>                for each output node (o)
>                        wt_sqrd += (expected_op[o] - op_value[o]) * 
>                                                (expected_op[o] - 
op_value[o]) 
>
>        wt_sqrd /= 2
>        
>        if wt_sqrd < 0.001
>           function is minimised
>        otherwise
>           go back and re-present all patterns
>        endif
>
>
>Thats it as best as I can copy it from my code but no matter what I 
change
>it still seems to fail.. In this instance, the network always minimises 
at
>0.5... I tried adding .1 to the sigmoid (as in Scott Fahlmans paper) to 
>reduce flat spots but this did not seem to help at all.. I have not yet
>been able to make this work once so it must be a problem with my 
algorithm.
>
>
>Thanks for reading this far..
>
>Jon
>--
>#include <disclaimer.h>
>+--------------------Reply to jonm@ingres.com-------------------------+
>| Then when the number of dwarfs dropped from 50 to 8. The other      |
>| dwarfs looked *very* suspiciously at 'Hungry'                       |
>+---------------------------------------------------------------------+
>| Jon Machtynger(jonm@ingres.com)                                     |
>| Bvd de la Woluwe 34 Bte. 13                                         |
>| Brussels. Belgium. (+32)  Ph: 02-774 49 23 Fax: 02-773 28 09        |
>+---------------------------------------------------------------------+
Well, I must admit I have not got time to read all the code,
but when I started playing around with neural nets I started them
on dead simple problems and dead simple nets. If you cannot get the
following problems to work you are in trouble, if you CAN then maybe
you need to twiddle the factors a bit:
1) Train a two input one output net to be an OR-gate
2) Train a two input one output net to be an AND-gate
3) Train a two input one output net to be an XOR-gate

If you don't know what these gate look like any book on
digital electronics or boolean logic will tell you in less
than 20 words each!

Good luck,

Owen

