Genetic Algorithms Digest    Friday, 18 March 1988    Volume 2 : Issue 9

 - Send submissions to GA-List@NRL-AIC.ARPA
 - Send administrative requests to GA-List-Request@NRL-AIC.ARPA

Today's Topics:
	- More on the definition of GA
	- GA Research Survey Response (2)
	- GAs and connectionism

------------------------------

Date: Wed, 16 Mar 88 18:02:12 PST
From: rik%cs@ucsd.edu (Rik Belew)
Subject: A four-bit definition of GA

  Sorry for the delay in my response; it turns out my link to
GAList had been broken.  But everything seems better now. Anyway,
I'm pleased to see how the discussion has progressed but feel the
need to clarify a few points regarding my earlier comments.

  First and foremost, I want to "make one thing perfectly clear":
When I defined GA(1) in terms of "...  John Holland and his
students" I most certainly did not intend that only those that
had journeyed to Ann Arbor and been annointed by the man himself
were fully certified to open GA franchises!  Defining a scientific
approach in terms of the personalities involved is not adequate,
for GA or any other work.   I was attempting to distinguish a
particular approach from the broader set of techniques that I called
GA(2).  In my experience, John is the "root" of  all such
work and much of it has been done by "direct" students of
his at Michigan and "indirect" students of these students.  I
also know, however, that others --- notably Dave Ackley and Larry
Rendell --- have worked on these methods without direct contact
with this lineage.  But I very much consider them "students of
Holland," in that they are aware of and have benefited from John's
work.  (Again, I mean that as a compliment, not because I have been
charged with GA Club membership validation.)  I see absolutely
no benefit and potentially great harm in drawing lines between
closely related bodies of research.

  So  let's move on to more meaningful attempts to define  the GA.  My
two-bit definition  focused on the  cross-over operator:   GA(1)
depended  on it, and  GA(2)  generally relied  on the  weaker (less
intelligent)  mutation operator.   This  lead Dave Ackley to feel
that:
      ...   membership in  GA(1) is restricted
    to  a small and  somewhat quirky "DNA-ish" subset  of all  possible
    combination rules [ackley@flash.bellcore.com, 3 Mar 88]

 I  take Dave  to mean that  the algorithm presented by
Holland (let's say the R class of algorithms described in his
ANAS Chapter 6, to be specific) sacrifices some performance in order
to remain more biologically plausible.  But I'm with Dave on this
one: Personally, I'm also more intereted in the algorithms than
their relation to real biological mechanisms.  (Let the record
show,  however, that there are GA practioners who do try to take
biological plausibility more seriously, e.g., Grosso and Westerdale.)

  So  the next  possibility is to  refer to properties of the
GA we find desirable.  For Dave, I think the key property of the
GA is its "implicit parallelism": the ability to search a huge
space implicitly by explicitly manipulating a very small set of
structures.  Jon Richardson comes closer to the definition I had
in mind with his emphasis on Holland's "building blocks" notion:

      The  proper   distinction  I   think  is whether  or not the
    recombination operator in  question  supports the  building block
    hypothesis.     "Mutation-like  operators" do  not  do  this.
    Any  kind  of weird recombination   which  can   be  shown  to
    propagate  and construct  building blocks, I  would  call a
    Genetic  Algorithm.   If the  operator  does nothing  with building
    blocks,  I  would consider  it apocryphal.  It   may   be
    valuable   but  apocryphal nonetheless and  shouldn't be called a
    GA.  [richards@UTKCS2.CS.UTK.EDU, 4 Mar 88]

  While I would echo the value of both these desiderata, I
don't find them technically tight enough to be useful. So I suggest
that we follow Holland's suggestion (in his talk at ICGA85) and
reserve the term "GA" for those algorithms for which we can
prove the "schemata theorem" (ANAS, Theorem 6.2.3).  I believe
this theorem is still the best understanding we have of how the GA
gives rise to the properties of implicit parallelism,  building
blocks, and why.   Of course, there are problems with this
definition as well.   In particular, it is so tightly wed to
the string representation and cross-over operator that it is very
difficult to imagine any algorithm very different from the
(traditional) GA that would satisfy the theorem.  But that's
exactly where I think the work needs to be done.

  Finally, I want to say why I (as Dave Ackley says) "took the
trouble to exclude" Ackely's SIGH system from my definition of
GA(1).  My answer is simply that I view SIGH as a hybrid.  It
has borrowed techniques from a number of areas, the GA,
connectionism, simulated annealing to name three.  There is
absolutely nothing wrong with doing this, and as Dave's thesis
showed and Larry Eshelman's note confirmed [Larry.Eshelman-
@F.GP.CS.CMU.EDU, 11 Mar 1988] there are at least some problems in
which SIGH does much better that the traditional GA. My only
problem with SIGH is that I can't do the apportionment of credit
problem:  when it works, I don't know exactly which technique is
responsible, and when it doesn't work I don't know who to
blame. I too think about connectionist algorithms and simulated
annealing along with Holland's GA and bucket brigade, and see all
of them as members of a class of algorithms I want to understand
better.  But I find it necessary to isolate the properties of each
before trying to combine them. In short, I think Dave and I agree
to a great extent (on the problem, and on what portions of the
solution might be), and disagree only in our respective approaches to
putting it all together.

------------------------------

Date: Wed, 16 Mar 88 15:25:27 est
From: liepins@UTKCS2.CS.UTK.EDU (Gunar Liepins)
Subject: Response to GA Activity Poll

WHO:     Dr. Mike Hilliard and Dr. Gunar Liepins 
		Oak Ridge National Laboratory / Martin Marrieta Energy Systems
	 Mark R. Palmer
	 Jon Richardson
	 Gita Rangarajan
	 Kejitan Dontas
		University of Tennessee at Knoxville

APPLICATION AREA:  Machine Learning in Job Shop Scheduling and Finite State
		   Environments

GENERAL APPROACH:  
		   Job Shop:  Classifier System (Michigan style) attempting to
	learn rules to schedule a job queue for a single machine with no
	precedence constraints (see ICGATA 2).  One system uses a message list
	of binary encoded run times and another system uses a message support 
	function of binary predicates (i.e. a message represents the results of
	predicates applied to two jobs).

		   Finite State:  Classifier System operating in a state space
	of one, two, and three dimensions (track, matrix, and cube).  Research
	into credit allocation (strength distribution) methods and their 
	effect upon the learning of rule chains.  Using bucket brigade, back 
	propagation, reward chunking and various combinations.

GA TOOL:  CFS-C customized beyond recognition and running on several UNIX 
	  systems.  

RESULTS: (good, bad, indifferent)  yes,  Bad to indifferent so far in the
	Job Shop - Run Time representation.  Good to very promising in the Job
	Shop - Message Support representation.  VERY INTERESTING (thus
	very good) results in the Finite State environments. 

PROBLEMS (parameter settings, representation, etc.)  YES!!!;  We have had
	problems in every area you care to name and some you wouldn't.  
	Most parameter setting problems are alleviated by creating a
	system performance measure to modify settings from their default
	values to some desired level (i.e. Noise (for conflict resolution)
	and crossover rates are a function of system performance measures and 
	are dynamically controlled to converge from high default settings 
	(exploration) to zero (exploitation)).  Representation is, of course, 
	THE problem with any new environment but we are satisfied with ours 
	for now.
	   The biggest continuing problems are those of system interaction
        and analysis.  How to view the operation of a classifier system 
	with 50 rules, running in a dynamic environment, controlled by 20 
	different parameter settings.  What should be recorded in order to 
	analyze both the system's behavior in the environment (CFS performance)
	and the contents of the classifier list (GA performance).  How to 
	manage the results of a classifier system with several different 
	modifications (e.g. credit allocation algorithms) run in three 
	different environments with several different initial parameter 
	settings.

------------------------------

Date: Wed, 16 Mar 88 15:27:14 est
From: liepins@UTKCS2.CS.UTK.EDU (Gunar Liepins)
Subject: Response to GA Activity Poll

WHO:  Dr. Gunar Liepins and Dr. Mike Hilliard
		Oak Ridge National Laboratory / Martin Marrieta Energy Systems
	 Mark R. Palmer
	 Jon Richardson
		University of Tennessee at Knoxville

APPLICATION AREA: Combinatorial optimization using the Set Covering Problem 
	(SCP) and the Traveling Salesman Problem (TSP).  Investigations into
	deceptive problems and adaptive crossover techniques.  Some of this is 
	a repeat of work done by others that we wish to verify, expand upon,
	and/or have available for comparison (e.g. PMX, Punctuated 
	Crossover). 

GENERAL APPROACH:  On combinatorial optimization - for each new idea, do 
	relative comparison of performance (best and trial found) for 
	the new method on a testbed of small sized, randomly generated 
	problems; then, if the results are encouraging, run on a suite of 
	large, 'real-world' problems.

GA TOOL: GENESIS with various heuristic or adaptive crossover algorithms, 
	new selection procedures and even a version for decimal encoding of 
	solutions (i.e. strings of integers).  Some elementary analysis
	routines (collating a group of result files, running Wilcoxsin
	sign rank tests).

RESULTS: (good, bad, indifferent)  yes; Very, very good in the SCP domain
	using a heuristic crossover and with a selection method suggested 
	by David Goldberg.  Interesting results with different penalty
	functions on the SCP.  Bad results with ranking selection (probably 
	just because we haven't done much with it).  Indifferent results with 
	adaptive crossover methods (mainly duplicated what has been done
	before).  Bad results with TSP; or more precisely, no better than
	anyone else.

PROBLEMS: Population size - Goldberg's work on optimum initial population sizes
	suggests an initial popsize that increases exponentially with string 
	length.  Since we work with large problems we must work with very
	sub-optimal popsizes.  Other parameters - We just used the default
	Genesis settings for Xover and Mutation rates and while these are 
	probably not the best, they were a place to start.  We have not run a 
	Meta-GA to to determine optimal settings.  System interaction and 
	analysis is always a problem.  Understanding exactly what the GA is 
	doing and why is very, very difficult.  Maintaining suites of problems 
	and the results of several different versions of the GA applied to 
	those suites is an awesome task in itself.
	TSP - representation, of course. 

P.S. We have suites of TSP and SCP problems and their optimal solutions taken 
from the literature.  These form a fairly standard testbed of problems.  Send 
mail if you want them.

------------------------------

Date: Thu, 17 Mar 88 11:17:23 EST
From: John Merrill <merrill@iuvax.cs.indiana.edu>
Subject: GA's and connectionism

I was recently made aware of discussions in this list of attempts to
apply Genetic Algorithms to connectionist network optimization.  I
have investigated this problem extensively; a summary of my results
follows.

The tasks to which the networks described here were applied were
rather arcane; if anyone is interested, I have a tech report giving
precise descriptions of the results.  In this note, I will merely
report the qualitative results of my studies.

LEARNING METHOD:

The paradigmatic genetic search employed in these experiments was
rather non-traditional in three ways.  First, populations were
small, consisting of between three and fifteen individuals. Instead of
an undirected mutation operator, an operator incorporating a form of
simulated annealing was employed.  Finally, a heuristic crossover
operator was used.  (The effects of each of these modifications will
be discussed below.)

Several such heuristic crossovers have been investigated.  The
simplest of these takes a single node, together with all the weights
on its inputs *and its outputs*, and exchanges it with its copy in
another network.  Two other classes of operators have been studied:
those that exchange blocks of nodes and those that exchange
corresponding pieces from all the weights on the inputs and the
outputs of a single node.

Recurrent networks with very simple nodes were trained The nodes in
these networks had only three possible outputs: -1, 0, and +1.  In all
other respects they were standard PDP nodes as described in (for
instance) Rumelhart, Hinton, and Williams.  Weights were represented a
strings of bits.  All nodes in the networks were connected to all
other nodes in the networks.

As in D. Offut's recent note to this list, the genetic algorithm was
used as a form of reinforcement learning.  

Qualitative results:

1)  The major determining factor for performance is the inclusion of a
local improvement operator.  For non-trivial tasks, the standard
mutation operator yields a search whose improvements are prohibitively
slow.  If a local improvement operator is included, then the size of
the population becomes relatively unimportant---performance as a
function of the number of generations is the same whether the
population includes 50 individuals or 5 individuals.

2)  Even when a local improvement operators is included in the system,
the presence of a cross-over operator improve the performance of the
algorithm.  The precise choice of the cross-over makes very little
difference, an so it is hard to evaluate precisely what function a
cross-over serves in this context.

3)  When no local improvement operator is present in the population,
the choice of a crossover operator becomes more important.  The
comparative performance of the algorithm with a heuristic cross-over
is much surperior to the performance with a standard XO operator.

Observations and comments:

1)  Stochastic algorithms are less efficient than deterministic
algorithms for weight modification.  Genetic search does not seem to
be an exception; in fact, it needs more function evaluations to
achieve a given level of performance than pure simulated annealing.

2)  On the other hand, several authors have observed that modifying
the connection structure of a network prior to learning modifies the
kind of thing learned (Dolan and Dyer, 1987; Merrill and Port, in.
prep.).  Deterministic algorithms apply only on manifolds; stochastic
algorithms must be applied to modify network topology.

----------------------------------------------------------------------

John Merrill			Internet: merrill@iuvax.cs.indiana.edu
Department of Computer Science  UUCP:	  pur-ee!iuvax!merrill
Indiana University		CSnet:	  merrill@indiana.csnet
Bloomington, Indiana 47405	BITnet:	  merrill@indiana.BITNET


[ Comment from Moderator:
With such a small population, it seems unlikely that you
would see any of the beneficial effects usually associated
with GAs -- i.e., implicit parallelism in the identification
and propagation of useful building blocks.  Previous studies
have shown that GAs generally require a larger population,
say 50-100 structures.  -- JJG]

------------------------------

End of Genetic Algorithms Digest
********************

