Genetic Algorithms Digest    Monday, 14 March 1988    Volume 2 : Issue 7

 - Send submissions to GA-List@NRL-AIC.ARPA
 - Send administrative requests to GA-List-Request@NRL-AIC.ARPA

Today's Topics:
	- GAs and CONNECTIONISM
	- SIGH
	
------------------------------

Date: Sat, 12 Mar 88 16:54:19 EST
From: offutt@caen.engin.umich.edu (daniel m offutt)
Subject: GA and CONNECTIONISM

A number of people have indicated an interest in using genetic
algorithms to do connectionist learning.  Over the last couple of
years I have done some informal experiments applying a genetic
algorithm to searching connectionist weight spaces.  I will give a
qualitative summary of my results here.  One warning: I only have
records of some of these experiments, so some of the following is
from memory.


THE LEARNING METHOD:

All of the experiments were done with an otherwise traditional genetic
algorithm with one modification: The two-point crossover operation
always exchanged a segment of L/2 bits where L is the number of bits
in a bit ring.  The first crossover point was selected uniformly at
random, and the second crossover point was chosen half-way around the
ring.  I used John Grefenstette's "Genesis II" genetic algorithm
package modified to manipulate rings of more than 8000 bits.  The
crossover rate used in these experiments was 1.0, the mutation rate
0.001.

The connectionist networks trained were the same feed-forward networks
traditionally used in combination with back propagation by
connectionists.  The error measure defined over the weight space was
the same sum-squared error measure that back propagation has
traditionally been applied to minimizing.  See Rumelhart, Hinton, and
Williams 1985 in the PDP volume 1 for a precise description of these
networks.

The learning method was simple: Code network weight vectors as binary
strings and minimize the error measure defined over the resulting
binary vector space using the genetic algorithm.  Notice that this is
a reinforcement learning method; it does not require direct access to
the correct answer for each training instance.  (Unaugmented back
propagation requires access to the correct answers.) On the other
hand, if the correct answers are available, the genetic algorithm
cannot make efficient use of this information.


NETWORK TOPOLOGIES:

A wide variety (> 100) of feed-forward network topologies were tried,
including networks whose widest layer consisted of between two and ten
units.  I also tried networks ranging in depth (number of layers) from
three to 12 layers.  All topologies had fully-interconnected adjacent
layers.


LEARNING TASKS:

The tasks posed to this learning method were all relatively simple
tasks.  Many are described in the paper mentioned above.  For example:
Two to eight-bit parity functions.  Simple encoder/decoder tasks.
Symmetry detectors.  The posed task was always to learn a mapping from
a vector of 1s and 0s to a vector of 1s and 0s.  The criterion of
success used by connectionsts for this type of task is to treat an
output >= 0.9 as correct iff the correct output is 1 and an output <=
0.1 as correct iff the correct output is 0.  I used this criterion.


SOME SUCCESSES:

The smallest network was a five-unit, nine-weight network applied to
the two-bit exclusive-or task.  A GA reliably trains this network to
perform the task is at most a few CPU minutes.  (Back propagation
takes cpu seconds.)  (As an aside, I tried Larry Eshelman's
implementation of SIGH on this weight-space search task, without
success or even progress after 8000 function evaluations.  It is,
however, possible that I did not select precisely the right parameter
settings for SIGH.)  The genetic algorithm was reliably able to train
deeper networks (at least as deep as five layers, if I recall
correctly) of the same width to do the same task, in about the same
amount of time.

A three-layer (8-5-8) network was taught an 8 bit encoder/decoder
task.  This task was learned in approximately 160,000 network
evaluations.  That is 3,200 generations for a population size of 50.
Progress on the analogous 8-3-8 encoder/decoder was much slower.

The largest network was a 12-layer network containing ten hidden
layers of ten units each.  This network had 1066 weights each coded in
an 8-bit field.  I set the GA to training this net to compute a five
bit parity function.  The task was learned in approximately 140 cpu
hours on a Microvax-II.  This is an interesting result.  I have never
run back propagation on a network this deep, but my impression from
the connectionists is that a twelve layer network could not be trained
by back propagation to perform this simple task, even in 130 cpu hours.


NON-SUCCESSES:

The genetic algorithm did not always solve the error minimization
tasks it was posed, in the time I made available.  The time I made
available was usually at least 30 cpu minutes.  Perhaps one in ten
learning experiments were successful.  But the GA made *progress* much
more often.  Many runs were interrupted while progress was still being
made.  Narrow networks with many hidden layers were easier to
train with a GA than with back propagation.  Wider networks with few
hidden layers (e.g. one) were more easily trained with back propagation
than with a genetic algorithm.


SOME OBSERVATIONS AND REMARKS:

A network's weights were encoded on a bit string in what might be
called "layer-normal form": All the weights linking layer j with layer
j+1 were coded in a contiguous substring.  The weights on all
connections leading into a given unit were coded in a contiguous
substring.  There are (very many) other ways that the weight vector
could have been mapped onto the bit string that might have worked
better.  This is a situation in which the inversion operator might be
helpful.

Since adjacent layers are fully interconnected, changing the function
of just one unit in layer j affects the function of every unit in
layer j+1.  So, these networks are not hierarchically-structured,
modular systems.  This may make crossover a less than ideal plausible
generator in this context.  By choosing a network topology with a
hierarchical structure of network modules" (and submodules...), and by
arranging for the bit-string encoding to permit traditional crossover
to exchange these modules, each crossover might yield a better
plausible guess.

The networks I experimented with were feed-forward networks
traditionally used with the back propagation algorithm.  I used these
networks in order to be able to make comparisons with back
propagation.  One could just as well use a GA to set the weights on a
Hopfield network, or on a network containing cycles,  or on a network
whose units all have different (and not necessarily continuous)
activation functions.

Another possibility is to combine a genetic algorithm with back
propagation: The genetic algorithm would "jump" (via crossover) to a
point W in the weight space, use back propagation to do gradient
descent from W for some number of iterations, yielding a new (and
improved) point W', recode W' into a binary string and replace W with
W' in the population of binary strings.

There are lots of possibilities.  If a GA package that can efficiently
manipulate large bit strings is available, it is easy to code a
simple network simulator and start doing experiments.


- Dan Offutt

------------------------------

Date: Mon, 14 Mar 88 10:38:09 GMT
From: Pat Prosser <pat%computer-science.strathclyde.ac.uk@NSS.Cs.Ucl.AC.UK>
Subject: Don't sigh for SIGH.

SIGH Scope
David Ackley has only looked at optimisation of black  box  func-
tions  (value assignment problems) that take as argument a binary
string (like GA). David has  not  looked  at  other  optimisation
problems  (sequencing/ordering as in Derek Smith's bin packer and
my pallet loader (sorry about the plug)) that require  non-binary
alphabets  to represent permutations.  I don't see how SIGH could
represent an "order based chromosone".

Ackley's thesis page 153 ".. aimed  at  searching  binary  vector
spaces".  The  difficulty appears to be the election of a permua-
tion, to ensure that each dimension (using a non-binary alphabet)
is unique.


Cost
In the thesis David's tests all ran until a known  maximum  value
was  reached  or  1  million function evaluations were made. This
does not give a good feel for the real run time cost of SIGH.  In
chapter  5  page  128 (5.1.3) "with a population size of 50 and a
64-bit function the algorithm can require up to 3,200 link weight
updates  per  function  evaluation". Ouch! The graph partitioning
problem appears to zap SIGH on a sequential machine.

If we have a problem of size n (requires n units in  the  govern-
ment e) and a population of size m (m voting units in f) then:

   - an election and reaction requires n*m additions (summing of
     link weights) and n+m evaluations of the sigmoid function
     (an exponential function).
   - a consequence phase requires the calculation of the
     reinforcement signal (r, again an exponential function),
     and the update of m*n link weights (n*m multiplications
     and additions).

The above effort corresponds to one  iteration  of  SIGH  with  a
problem  of size n and a population of m voters. The above effort
must be expended for the evaluation of 1 point.

What am I getting at? Well Ackley is measuring effort in terms of
the  amount  of  function  evaluations  performed for each search
technique investigated; he is not looking at elapsed time or  to-
tal  effort.  The overhead for one function evaluation (one point
in the search space) in SIGH is very high.  SIGH  looks  like  it
will only be useful in problems where the cost of evaluation of a
point is very high and the landscape is rugged.


A Typical Application for SIGH
The above comments  may  appear  negative.  Nevertheless  we  are
thinking  about  using  SIGH for our project, factory scheduling.
One aspect of the project is to simulate  the  factory  processes
and  derive rules for scheduling or settings (values) for machine
centres. One run of the simulation would amount to one evaluation
of a function (this may take many hours). SIGH would then suggest
the next simulation to be run etc. Any comments (Dave)?

------------------------------

End of Genetic Algorithms Digest
********************


