[<--] Return to the list of AI and ANN lectures

In the last lecture, I gave an overview of the features common to most
neural network models. By clicking here, you
can see a diagram summarizing the way that the net input *u* to
a neuron is formed from any external inputs, plus the weighted output *V*
from other neurons. This is used to form an output *V = f(u)*, by one
of various input/output relationships (step function, sigmoid, etc.).
These usually involve a threshold parameter, theta. At the bottom of
the figure, there is a typical network, with input units receiving external
inputs, hidden units which communicate only with other neurons, and output
units whose outputs are visible to the outside world.

Today, we will start our examination of some specific models.

In 1943 two electrical engineers, Warren McCullogh and Walter Pitts, published the first paper describing what we would call a neural network. Their "neurons" operated under the following assumptions:

- They are binary devices (V
_{i}= [0,1]) - Each neuron has a fixed threshold, theta
- The neuron receives inputs from excitatory synapses, all having identical weights. (However it my receive multiple inputs from the same source, so the excitatory weights are effectively positive integers.)
- Inhibitory inputs have an absolute veto power over any excitatory inputs.
- At each time step the neurons are simultaneously (synchronously) updated
by summing the weighted excitatory inputs and setting the output
(V
_{i}) to 1 iff the sum is greater than or equal to the threhold AND if the neuron receives no inhibitory input.

We can summarize these rules with the McCullough-Pitts output rule

and the diagram

Using this scheme we can figure out how to implement any Boolean logic function. As you probably know, with a NOT function and either an OR or an AND, you can build up XOR's, adders, shift registers, and anything you need to perform computation.

We represent the output for various inputs as a **truth table**, where
0 = FALSE, and 1 = TRUE. You should verify that when W = 1 and theta = 1,
we get the truth table for the logical NOT,

Vin | Vout -----+------ 1 | 0 0 | 1

by using this circuit:

With two excitatory inputs V_{1} and V_{2}, and W =1, we
can get either an OR or an AND, depending on the value of theta:

if

if

Can you verify that with these weights and thresholds, the various possible
inputs for V_{1} and V_{2} result in this table?

V1 | V2 | OR | AND ---+----+----+---- 0 | 0 | 0 | 0 0 | 1 | 1 | 0 1 | 0 | 1 | 0 1 | 1 | 1 | 1

The **exclusive OR** (XOR) has the truth table:

V1 | V2 | XOR ---+----+---- 0 | 0 | 0 0 | 1 | 1 (Note that this is also a 1 | 0 | 1 "1 bit adder".) 1 | 1 | 0

It cannot be represented with a single neuron, but the relationship

XOR = (V_{1} OR V_{2}) AND NOT (V_{1} AND V_{2})
suggests that it can be represented with the network

**Exercise:** Explain to your own satisfaction that this
generates the correct output for the four combinations of inputs.
What computation is being made by each of the three "neurons"?

These results were very encouraging, but these networks displayed no learning. They were essentially "hard-wired" logic devices. One had to figure out the weights and connect up the neurons in the appropriate manner to perform the desired computation. Thus there is no real advantage over any conventional digital logic circuit. Their main importance was that they showed that networks of simple neuron-like elements could compute.

The next major advance was the perceptron, introduced by Frank Rosenblatt in his 1958 paper. The perceptron had the following differences from the McCullough-Pitts neuron:

- The weights and thresholds were not all identical.
- Weights can be positive or negative.
- There is no absolute inhibitory synapse.
- Although the neurons were still two-state, the output function f(u) goes from [-1,1], not [0,1]. (This is no big deal, as a suitable change in the threshold lets you transform from one convention to the other.)
- Most importantly, there was a learning rule.

Describing this in a slightly more modern and conventional notation
(and with V_{i} = [0,1]) we could describe the perceptron like this:

This shows a perceptron unit, *i*, receiving various inputs
I_{j}, weighted by a "synaptic weight" W_{ij}.

The *i*th perceptron receives its input from *n* input units,
which do nothing but pass on the input from the outside world. The output of
the perceptron is a step function:

and

For the input units, V_{j} = I_{j}. There are various ways of
implementing the threshold, or bias, theta_{i}. Sometimes it is
subtracted, instead of added to the input *u*, and sometimes it is
included in the definition of *f(u)*.

A network of two perceptrons with three inputs would look like:

Note that they don't interact with each other - they receive inputs only from the outside. We call this a "single layer perceptron network" because the input units don't really count. They exist just to provide an output that is equal to the external input to the net.

The learning scheme is very simple. Let t_{i} be the desired "target"
output for a given input pattern, and V_{i} be the actual output.
The error (called "delta") is the difference between the desired and
the actual output, and the change in the weight is chosen to be
proportional to delta.

Specifically, and

where is
the *learning rate*.

Can you see why this is reasonable? Note that if the output of the *i*th
neuron is too small, the weights of all its inputs are changed to increase its
total input. Likewise, if the output is too large, the weights are changed to
decrease the total input. We'll better understand the details of why this
works when we take up back propagation. First, an example.

Before we can start, we have to ask, "how can we use this rule to modify the threshold or bias term, theta?"

Answer: treat theta as the weight from an additional input which is always "on" (V = 1). Now, consider the the net:

Unit 3 (the perceptron) receives inputs from the two input units 1 and 2,
weighted by W_{31} and W_{32}, and a constant input of 1,
weighted by theta_{3}.

Let and intitially set all the weights to .

Then, we have

The error term is . This means that the change in weight will be , and the change in the bias is .

Now fill in this table showing the results of each iteration, stopping when there is no further change through the presentation of all four patterns. We call each set of four patterns an "epoch". In this case, we are "training by pattern" because we adjust the weights after each patttern. Sometimes, nets are "trained by epoch", with the net change in weights applied after each epoch. (I'll do the first iteration.)

| | | | | | new | new | new V_1 | V_2 | t_3 | u_3 | V_3 | delta_3 | W_31 | W_32 | theta_3 ----+-----+-----+------+------+---------+------+------+--------- 0 | 0 | 0 | 0 | 1 | -1 | 0 | 0 | -0.5 0 | 1 | 1 | | | | | | 1 | 0 | 1 | | | | | | 1 | 1 | 1 | | | | | | ----+-----+-----+------+------+---------+------+------+--------- 0 | 0 | 0 | | | | | | 0 | 1 | 1 | | | | | | 1 | 0 | 1 | | | | | | 1 | 1 | 1 | | | | | | ----+-----+-----+------+------+---------+------+------+--------- 0 | 0 | 0 | | | | | | 0 | 1 | 1 | | | | | | 1 | 0 | 1 | | | | | | 1 | 1 | 1 | | | | | | ----+-----+-----+------+------+---------+------+------+--------- 0 | 0 | 0 | | | | | | 0 | 1 | 1 | | | | | | 1 | 0 | 1 | | | | | | 1 | 1 | 1 | | | | | | ----+-----+-----+------+------+---------+------+------+---------

How many epochs does it take until the perceptron has been trained to
generate the correct truth table for an OR? Note that, except for a scale
factor, this is the same result which McCullogh and Pitts deduced for the
weights and bias without letting the net do the learning. (Do you see why a
positive threshold for a M-P neuron is equivalent to adding a negative bias
term in the expression for the perceptron total input *u*?)

[<--] Return to the list of AI and ANN lectures

dbeeman "at" dogstar "dot" colorado "dot" edu

Tue Oct 30 12:19:58 MST 2001