Newsgroups: comp.ai.neural-nets
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!hookup!swrinde!howland.reston.ans.net!news.sprintlink.net!peernews.demon.co.uk!demon!cromer.demon.co.uk!louis
From: louis@cromer.demon.co.uk (Louis Gidney)
Subject: Re: Biasing
X-Nntp-Posting-Host: cromer.demon.co.uk
Message-ID: <19950210.001854.53@cromer.demon.co.uk>
Sender: news@demon.co.uk (Usenet Administration)
Reply-To: louis@cromer.demon.co.uk
X-Newsreader: Archimedes TTFN Version 0.36
Date: Fri, 10 Feb 1995 00:18:54 GMT
Lines: 113


This letter is intended both as a contribution to the recent "Biasing"
thread, and a plea for advice.

I would be grateful if someone in the neural net field could help me
with the following problem:

I wish to 'build' a simple threshold unit with two inputs that can be
'trained' to perform one of several logic functions.  (I appreciate that
it will not be possible to get just one such device to perform any of the
16 possible logical functions of two variables - the exclusive-OR function
or its complement would be impossible for example.  I estimate the total
number it can produce to be 9 - including always ON and always OFF).

In the diagram below (which can also be regarded as a truth table) the
asterisks indicate all the possible 0/1 states of its two inputs x and y.
The straight line (ax+by+c=0) is set up to implement the AND function.
The device gives a 'true' or 'ON' output when (ax+by+c)>=0, and is 'OFF'
otherwise.  Since it is sometimes helpful to have specific numerical
values in an example, suppose that in the diagram below, a=2.5, b=1.25
and  c  (= -ab) = -3.125


                     \
                      \ <--- line (ax+by+c)=0
                   y   \
                        \
                   |     \
                   |      \
                 (0,1)     \ (1,1)
                   *        \  *
                   |         \
                   |          \       Output 'ON' when
                   |           \       (ax+by+c) >= 0
                   |            \
                   |             \
                   *-----------*--\---------- x
                 (0,0)       (1,0) \
                                    \
                                     \
               Output 'OFF' when     \
                 (ax+by+c) < 0         \
                                        \


In the situation as shown we may call a and b 'weights'.  We can
either call c 'bias' if we test on: (ax+by+c)>=0 as shown above, then
we may call zero the 'threshold'.  Alternatively we can call (-c)
the 'threshold' if we test on: (ax+by) >= (-c), in which case we have
no 'bias'.  The two cases are essentially the same.  It is just that
we have chosen to look upon it differently.

The situation gets a little more interesting (though still remaining
essentially equivalent) when we 'split' c between a 'bias' d and a
'threshold' T, so that the equation of the line becomes:

   ax+bx+(c+T)=T   ( where we call T 'threshold' and (c+T)=d 'bias' )

We then test on the condition:    (ax+by+d) > T    where d = c+T

In our specific numerical example we have:
                    2.5*x + 1.25*y - 3.125  = 0

If we set the threshold at (say) 4, then to implement the same
function the equation becomes:
                    2.5*x + 1.25*y + 0.875  = 4

and we test on:     2.5*x + 1.25*y + 0.875 >= 4


My problem is,
~~~~~~~~~~~~~
that basically I don't know how to design-in  the way  a, b, d and T
shall be incremented or decremented by 'punishment' or 'reward' inputs,
so as to direct it towards producung one function rather than any other.
Or, if it is not possible to 'direct' it, at least to be confident that
the linkage of the punish/reward inputs to a, b, etc, makes it capable
of being trained to produce any of the 9 [or 14 ?] possible logical
functions eventually.

To be specific, suppose I now want to "retrain" the device above so that
instead of returning the 'and' function as shown, it will turn 'ON' when
the inputs are (say) x=0, y=1 and be 'OFF' otherwise.  How should
"punishment" and/or "reward" alter the 'weights' a and b, the 'bias' d,
and the 'threshold' T ?

One can see from simple algebra what sets of values of a, b, d and T
will cause it to produce what logical functions.  And by inspecting the
internals of the device we can know what the 'weights', etc  are at
any time, but the outside world (which is 'punishing' or 'rewarding' it)
cannot know whether its state (a,b,c,T), (ie, the straight line in the
diagram), is moving in the desired way until the occasion arises when
a different output for one of the four possible combinations of inputs
is actually produced.

The main problem seems to be this:  In moving from one internal state to
another, one would like the 'weights', etc, to be inching in a different
combination of directions depending on what states it is being trained
to move from and to (in order to 'swing the line round' and displace it,
sometimes one way, sometimes another).  This seems to pose a dilemma.
Should/must 'punishment' introduce randomness into the ways a, b, d and T
get altered ?

Is 'reward' necessary ?  What if the AND function is the function I want,
and it is already performing it, then need I 'reward' it - and if so how ?
Surely that might impair its already 'correct' behaviour ?  (What might be
called the "if-it-aint-bust-don't-fix-it reinforcement rule" !).

I would be grateful for help on this.

Apologies for being long-winded,
-- 
Louis Gidney
