Artificial Neural Networks

                       An Example Program in PASCAL


                    		  Mark Watson, Copyright 1987



	  There are many reasons for doing neural modeling which range in
scope from research in cognitive pschycology to exploring techniques for
performing computationally expensive signal processing tasks using
massive numbers of simple hardware realizable components.  In the last
twenty years, several theories for self adapting systems have evolved; 
the program discussed here uses one of these approaches, the Generalized
Delta Rule for error propagation and connection weight modification (ie.
learning).  The best reference for the Generalized Delta Rule and other
popular learning paradigms is the book  "Parallel Distributed Processing,
Exploring the Microstructures of Cognition" by David Rumelhart and James
McClelland (editors).



	  What is a neural network?  We will answer this in the context of the
simulation program (listed at the end of this article) by first defining a
few terms:


	neuron		      A processing element with one memory cell to
			            record its energy or activation level.  A neuron
			            has one output and any number of inputs.


	connection	   A linkage between two neurons.  Each such linkage
			            is characterized by a the two connected neurons,
			            by a connection strength or weight, and by a
            			direction.


	weight		      A numeric factor which determines how easily
			            a connection propagates energy excitation
			            from one neuron to another.


	slab		        A collection of neurons which are manipulated
			            by this system as a group and which have no
			            connections between them.



	   In the attached PASCAL program, there are three slabs of neurons:
input, hidden and output.  Activation energy flows from the input slab
neurons through connecting weights to hidden slab neurons;  the larger a
weight, the faster it can propagate activation energy to hidden slab  
neurons.  Similarly, the hidden slab to the output slab weights carry
activation energy from the hidden slab neurons to the output slab neurons. 
It is interesting to note that the information processing capability
of neural networks is in the weights, not in the activation values.  The
activation values tend to change quickly while the weights only change
during 'learning', and then the are modified very slowly.


	   The first step in using the attached program is to modify the number
of neurons in each slab, if desired, by changing the constant declarations
at the beginning of the program.  The next step in using this model is to
"train" it by giving it a training set consisting of values for the input slab
neurons and the desired values for the output slab neurons.  The attached
program is already set up to learn four patterns;  the main program
consists of a training loop and a final test.  Notice that the network is
trained by repeating the complete set of training examples many times.  A
common mistake is made by completely training a Generalized Delta
Network for one pattern (i.e. so there is little error when
comparing the values of the output slab activations with the
desired system output) before learning the next pattern.


	   It takes a very long time to learn patterns in a neural network that
is being simulated on a conventional computer.  The learning process is
especially slow on a Macintosh with no hardware floating point support. 
The example program uses floating point calculations, although it is fairly
straight forward to use 32 bit integer arithmetic for implementing the
Generalized Delta Rule.  It is difficult to perform learning experiments in
8 or 16 bit integer arithmetic because calculated changes to the
connection weights must be very small compared to the magnitude of the
weights themselves.  This is very important since in general many
patterns will be learned by a system, and fast learning causes fast
'forgetting'.


	   I am training a Delta Rule Network (with six slabs, which is more 
complex than the attached program) to recognize digitized speech.  The
learning process is painfully slow.  This program has been optimized by
using 32 bit integer arithmetic (instead of floating point) and by storing
values for the procedure 'sigmoid' in the attached listing in a large table
and perfoming a lookup instead of evaluating a transcedental function for
every activation update.  Still the speech system takes DAYS to learn a
few phonemes (although recognition is fast since only one forward pass
through the network is required).  The company I work for (Science
Applications International Corporation) is building a very high speed
processor for peforming all the standard neural network learning
paradigms (Generalized Delta, Adaptive Resonance, Hopfield, etc.).  I can
perform about 5,000 weight updates (in learning) per second on a
Macintosh; our processor will perform 1 to 10 million updates per second. 
Several research groups are working on 'real' neural computers that have
many (thousands, perhaps millions of) individual processors with many
more (tens of millions) of hardware weight interconnects; it will probably
take several years before these systems can be built with enough neurons and
connecting weights to solve useful problems.



	Listing 1 - Generalized Delta Rule Simulator in Turbo PASCAL



PROGRAM testGeneralizedDeltaRuleEngine;

{------------- Copyright 1987, Mark L. Watson -----------------}
{-------- Program to test Generalized Delta Engine ------------}

{$R-}
{ This simulator is hardwired for 3 slabs: input, hidden and output }
 USES Memtypes, QuickDraw;

	CONST
		numInput  = 4;    { number of neurons in the input slab }
		numhidden = 20;   { number of neurons in the hidden slab }
		numOutput = 4;    { number of neurons in the output slab }

  numCompleteTrainingCycles = 50;

	TYPE
		INPUTARRAY = ARRAY[1..numInput] OF real;

	VAR
		inputA:INPUTARRAY;                                { activations }
		hiddenA:ARRAY[1..numhidden] OF real;              { activations }
		hiddenN:ARRAY[1..numhidden] OF real;              { sum of products }
		hiddenD:ARRAY[1..numhidden] OF real;              { output error }
		hiddenW:ARRAY[1..numhidden, 1..numInput] OF real; { connection weights }
		outputA:ARRAY[1..numOutput] OF real;              { activations }
		outputN:ARRAY[1..numOutput] OF real;              { sum of products }
		outputD:ARRAY[1..numOutput] OF real;              { output error }
		outputW:ARRAY[1..numOutput, 1..numhidden] OF real;{ connection weights }

		eida, theta:real; { learning rate and sigmoid threshold }
		waitChar:char;
		iter, numTrain:integer;

{--------------Start of the Generalized Delta Simulation Engine ------}

	FUNCTION xmax (x, y:real):real;
	BEGIN	IF x > y THEN	xmax:=x	ELSE	xmax:=y; END;

	FUNCTION xmin (x, y:real):real;
	BEGIN	IF x < y THEN	xmin:=x	ELSE	xmin:=y; END;

	FUNCTION sigmoid (x:real):real; { non-linear neuron response function }
	BEGIN
		sigmoid:=xmin(1.0,xmax(-1.0, 1.0 - exp(-(1.5*x) + theta)));
	END;

	PROCEDURE feedforward; { propagate neuron activation values from the   }
		VAR                   { input neurons to the hidden neurons, then     }
			i, j:integer;        { from the hidden neurons to the output neurons }
			sum2:real;
	BEGIN
    { do input to hidden slab:}
		FOR i:=1 TO numhidden DO
			BEGIN
				sum2:=0;
				FOR j:=1 TO numInput DO
					sum2:=sum2 + hiddenW[i, j] * inputA[j];
				hiddenN[i]:=sum2;
				hiddenA[i]:=sigmoid(sum2);
			END;
    { do hidden to output slab:}
		FOR i:=1 TO numOutput DO
			BEGIN
				sum2:=0;
				FOR j:=1 TO numhidden DO
					sum2:=sum2 + outputW[i, j] * hiddenA[j];
				outputN[i]:=sum2;
			END;
	END; { feedforward }

	PROCEDURE calcdeltas;
		VAR	i, n:integer;
	   		del, delw, temp:real;
	BEGIN
    { calculate deltas for the output slab }
		FOR n:=1 TO numOutput DO
			BEGIN
				del:=outputA[n] - sigmoid(outputN[n]);
				outputD[n]:=del;
				FOR i:=1 TO numhidden DO
					outputW[n, i]:=outputW[n, i] + del * hiddenA[i] * eida;
			END;
	END; { calcdeltas }

	PROCEDURE updateweights;
		VAR	i, n:integer;
		   	del, temp, sum2:real;
	BEGIN
    { now update the hidden slab neuron weights: }
		FOR n:=1 TO numhidden DO
			BEGIN
				sum2:=0;
				FOR i:=1 TO numOutput DO
					sum2:=sum2 + outputD[i] * outputW[i, n];
				del:=sum2;
				FOR i:=1 TO numInput DO
					hiddenW[n, i]:=hiddenW[n, i] + eida * del * inputA[i];
			END;
	END; { updateweights }

	PROCEDURE learn;  { main procedure for weight modification }
	BEGIN
		feedforward;
		calcdeltas;
		updateweights;
	END;

	PROCEDURE run;    { procedure for recalling network outputs for a }
		VAR              { given input }
			i, j:integer;
			sum2:real;
	BEGIN
    { do input to hidden slab:}
		FOR i:=1 TO numhidden DO
			BEGIN
				sum2:=0;
				FOR j:=1 TO numInput DO
					sum2:=sum2 + hiddenW[i, j] * inputA[j];
				hiddenA[i]:=sigmoid(sum2);
			END;
    { do hidden to output slab:}
		FOR i:=1 TO numOutput DO
			BEGIN
				sum2:=0;
				FOR j:=1 TO numhidden DO
					sum2:=sum2 + outputW[i, j] * hiddenA[j];
				outputA[i]:=sigmoid(sum2);
			END;
	END; { run }

{-----------------END of Generalized Delta Rule Simulation Engine ------}

	FUNCTION frandom (xmin, xmax:real):real;
		VAR	x:real;
	BEGIN
		x:=Random MOD 5000; x:=abs(x);
		frandom:=x * (xmax - xmin) / 5000.0 + xmin;
	END;

	PROCEDURE init;
		VAR	i, n:integer;
	BEGIN
		FOR i:=1 TO numhidden DO
			BEGIN
				hiddenA[i]:=frandom(0.005, 0.2);
				FOR n:=1 TO numInput DO
					hiddenW[i, n]:=frandom(-0.1, 0.15);
			END;
		FOR i:=1 TO numOutput DO
			BEGIN
				FOR n:=1 TO numhidden DO
					outputW[i, n]:=frandom(-0.1, 0.15);
			END;
	END;  { init }

 PROCEDURE TrickForLearning; { Special trick: add a little random noise  }
  VAR i:INTEGER;             { to the training input for faster          }
 BEGIN                       { learning. This procedure is not required. }
    FOR i:=1 to numInput DO inputA[i] := inputA[i] + frandom(0.0,0.15);
 END;
 
	PROCEDURE printKey (a:INPUTARRAY); { print output activations after }
		VAR	i:integer;                    { doing one recall cycle         }
	BEGIN
		FOR i:=1 TO numInput DO	inputA[i]:=a[i]; { load up the input neurons }
		run;                                     { calculate outputs }
		FOR i:=1 TO numOutput DO		write(outputA[i]:5:2, ' '); { print outputs }
		writeln;
	END;

{--- The following is a 'throw away' test main program.  To use this
     system for your own applications, set the size of the network by
     changing this constants: 'numInput', 'numHidden', and 'numOutput'
     at the beginning of this file.  Fill in the inputA array and call
     procedure 'learn' with the patterns to learn.  Calling 'run' with
     desired values in 'inputA' yeilds the system's response in the
     array 'outputA'.
--}
BEGIN { main program }

	eida:=0.7; { set learning rate to a moderately high value }
	theta:=-0.3;


	init;  { randomize weights and set up network to BEGIN }
 writeln('Starting to learn 4 simple patterns:');
 
	FOR numTrain:=1 TO numCompleteTrainingCycles DO
		BEGIN
   IF eida > 0.05 THEN eida:=eida * 0.8; { slow down the learning rate }

    writeln;
    writeln('Starting to learn all patterns with learning rate=',eida);
    writeln;
     
   { learn first pattern }
			inputA[1]:=0.0;	 inputA[2]:=0.5;  inputA[3]:=0.5;  inputA[4]:=0.0;
			outputA[1]:=0.5; outputA[2]:=0.0; outputA[3]:=0.0; outputA[4]:=0.0;
   TrickForLearning;
			FOR iter:=1 TO 2 DO		learn; { do 5 learning cycles }
			write(' Output activations should be 0.5 0.0 0.0 0.0, they are:');
  	printKey(inputA); { see error }
   
   { learn second pattern }
			inputA[1]:=0.5;	 inputA[2]:=0.0;  inputA[3]:=0.5;  inputA[4]:=0.0;
			outputA[1]:=0.0; outputA[2]:=0.5; outputA[3]:=0.0; outputA[4]:=0.0;
   TrickForLearning;
			FOR iter:=1 TO 2 DO		learn; { do 5 learning cycles }
			write(' Output activations should be 0.0 0.5 0.0 0.0, they are:');
   printKey(inputA); { see error }
   
   { learn third pattern }
			inputA[1]:=0.0;	 inputA[2]:=0.5;  inputA[3]:=0.0;  inputA[4]:=0.5;
			outputA[1]:=0.0; outputA[2]:=0.0; outputA[3]:=0.5; outputA[4]:=0.0;
   TrickForLearning;
			FOR iter:=1 TO 2 DO		learn; { do 5 learning cycles }
			write(' Output activations should be 0.0 0.0 0.5 0.0, they are:');
   printKey(inputA); { see error }
   
   { learn fourth pattern }
			inputA[1]:=0.5;	 inputA[2]:=0.0;  inputA[3]:=0.0;  inputA[4]:=0.5;
			outputA[1]:=0.0; outputA[2]:=0.0; outputA[3]:=0.0; outputA[4]:=0.5;
   TrickForLearning;
			FOR iter:=1 TO 2 DO		learn; { do 5 learning cycles }
			write(' Output activations should be 0.0 0.0 0.0 0.5, they are:');
   printKey(inputA); { see error }

		END;

{-- done learning patterns, test the network to see how well the
    patterns were learned.  Note: printKey calls run to print
    the values of the Output activations for a specified
    set of input neuron activations.
--}

  writeln; writeln('Done with 10 learning cycles: recall patterns:');
  
   
   { test first pattern }
			inputA[1]:=0.0;	inputA[2]:=0.5; inputA[3]:=0.5; inputA[4]:=0.0;
			write(' Output activations should be 0.5 0.0 0.0 0.0, they are:');
   printKey(inputA); { see error }
   
   { test second pattern }
			inputA[1]:=0.5;	inputA[2]:=0.0; inputA[3]:=0.5; inputA[4]:=0.0;
			write(' Output activations should be 0.0 0.5 0.0 0.0, they are:');
   printKey(inputA); { see error }
   
   { test third pattern }
			inputA[1]:=0.0;	inputA[2]:=0.5; inputA[3]:=0.0; inputA[4]:=0.5;
			write(' Output activations should be 0.0 0.0 0.5 0.0, they are:');
   printKey(inputA); { see error }
   
   { test fourth pattern }
			inputA[1]:=0.5;	inputA[2]:=0.0; inputA[3]:=0.0; inputA[4]:=0.5;
			write(' Output activations should be 0.0 0.0 0.0 0.5, they are:');
   printKey(inputA); { see error }
  
	  writeln('Hit a return to END...');  	 read(waitChar);

END.                        