Newsgroups: comp.ai.genetic,comp.answers,news.answers
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!europa.eng.gtefsd.com!gatech!swrinde!pipex!uknet!cf-cm!David.Beasley
From: David.Beasley@cm.cf.ac.uk (David Beasley)
Subject: FAQ: comp.ai.genetic part 2/6 (A Guide to Frequently Asked Questions)
Message-ID: <part2_787934954@cm.cf.ac.uk>
Followup-To: comp.ai.genetic
Summary: This is part 2 of a <trilogy> entitled "The Hitch-Hiker's Guide to 
         Evolutionary Computation". A periodically published list of 
         Frequently Asked Questions (and their answers) about Evolutionary 
         Algorithms, Life and Everything. It should be read by anyone who 
         whishes to post to the comp.ai.genetic newsgroup, preferably *before* 
         posting.
Originator: David.Beasley@cm.cf.ac.uk (David Beasley)
Sender: David.Beasley@cm.cf.ac.uk (David Beasley)
Supersedes: <part2_780051892@cm.cf.ac.uk>
Organization: University of Wales College of Cardiff, Cardiff, WALES, UK.
References: <part1_787934954@cm.cf.ac.uk>
Date: Tue, 20 Dec 94 14:50:18 GMT
Approved: news-answers-request@MIT.Edu
Expires: 24 Mar 1995 14:49:14 GMT
Lines: 1047
Xref: glinda.oz.cs.cmu.edu comp.ai.genetic:4591 comp.answers:9051 news.answers:34441

Archive-name:   ai-faq/genetic/part2
Last-Modified:  12/20/94
Issue:          2.4


TABLE OF CONTENTS OF PART 2
     Q1: What are Evolutionary Algorithms (EAs)?
     Q1.1: What's a Genetic Algorithm (GA)?
     Q1.2: What's Evolutionary Programming (EP)?
     Q1.3: What's an Evolution Strategy (ES)?
     Q1.4: What's a Classifier System (CFS)?
     Q1.5: What's Genetic Programming (GP)?

----------------------------------------------------------------------

Subject: Q1: What are Evolutionary Algorithms (EAs)?

     Evolutionary  algorithms  use  computational  models  of evolutionary
     processes as  key  elements  in  the  design  and  implementation  of
     computer-based  problem  solving  systems.  A variety of evolutionary
     computational  models  have  been  proposed.  They  share  a   common
     conceptual  base of simulating the EVOLUTION of INDIVIDUAL structures
     via  processes  of  SELECTION,  MUTATION,  and   REPRODUCTION.    The
     processes  depend  on  the  perceived  PERFORMANCE  of the individual
     structures as defined by an ENVIRONMENT.

     More precisely, EAs maintain a POPULATION of structures, that  evolve
     according  to  rules  of  SELECTION  and  other  operators,  that are
     referred to as "search operators", (or GENETIC  OPERATORs),  such  as
     RECOMBINATION  and  MUTATION.   Each  INDIVIDUAL  in  the  population
     receives a measure of it's FITNESS in the ENVIRONMENT.   REPRODUCTION
     focuses  attention  on high fitness individuals, thus exploiting (cf.
     EXPLOITATION) the available fitness information.   Recombination  and
     mutation  perturb those individuals, providing general heuristics for
     EXPLORATION.  Although simplistic from a biologist's viewpoint, these
     algorithms  are  sufficiently  complex to provide robust and powerful
     adaptive search mechanisms.

     --- "An Overview of Evolutionary Computation" [ECML93], 442-459.

 PSEUDO CODE
     Algorithm EA is

	  // start with an initial time
	  t := 0;

	  // initialize a usually random population of individuals
	  initpopulation P (t);

	  // evaluate fitness of all initial individuals in population
	  evaluate P (t);

	  // test for termination criterion (time, fitness, etc.)
	  while not done do

	       // increase the time counter
	       t := t + 1;

	       // select sub-population for offspring production
	       P' := selectparents P (t);

	       // recombine the "genes" of selected parents
	       recombine P' (t);

	       // perturb the mated population stochastically
	       mutate P' (t);

	       // evaluate it's new fitness
	       evaluate P' (t);

	       // select the survivors from actual fitness
	       P := survive P,P' (t);
	  od
     end EA.

------------------------------

Subject: Q1.1: What's a Genetic Algorithm (GA)?

     The GENETIC ALGORITHM is a model of machine  learning  which  derives
     its behavior from a metaphor of the processes of EVOLUTION in nature.
     This is done by the creation within a  machine  of  a  POPULATION  of
     INDIVIDUALs represented by CHROMOSOMEs, in essence a set of character
     strings that are analogous to the base-4 chromosomes that we  see  in
     our  own  DNA.   The  individuals in the population then go through a
     process of evolution.

     We should note that EVOLUTION (in nature or anywhere else) is  not  a
     purposive  or  directed  process.   That  is, there is no evidence to
     support the assertion that  the  goal  of  evolution  is  to  produce
     Mankind.  Indeed,  the  processes  of  nature  seem  to  boil down to
     different INDIVIDUALs competing for  resources  in  the  ENVIRONMENT.
     Some are better than others. Those that are better are more likely to
     survive and propagate their genetic material.

     In nature, we see that  the  encoding  for  our  genetic  information
     (GENOME)  is  done in a way that admits asexual REPRODUCTION (such as
     by budding) typically  results  in  OFFSPRING  that  are  genetically
     identical  to the PARENT.  Sexual REPRODUCTION allows the creation of
     genetically radically different offspring that are still of the  same
     general flavor (SPECIES).

     At  the  molecular level what occurs (wild oversimplification alert!)
     is that a pair of CHROMOSOMEs bump into one another, exchange  chunks
     of  genetic  information  and  drift apart. This is the RECOMBINATION
     operation, which GA/GPers generally refer to as CROSSOVER because  of
     the  way  that  genetic  material crosses over from one chromosome to
     another.

     The CROSSOVER operation happens in an ENVIRONMENT where the SELECTION
     of  who  gets to mate is a function of the FITNESS of the INDIVIDUAL,
     i.e. how good the individual is  at  competing  in  its  environment.
     Some  GENETIC ALGORITHMs use a simple function of the fitness measure
     to  select  individuals  (probabilistically)   to   undergo   genetic
     operations such as crossover or asexual REPRODUCTION (the propagation
     of  genetic  material  unaltered).   This  is   fitness-proportionate
     selection.   Other  implementations  use  a  model  in  which certain
     randomly selected individuals in a subgroup compete and  the  fittest
     is  selected.  This is called tournament selection and is the form of
     selection we see in nature when stags rut to vie for the privilege of
     mating  with  a herd of hinds. The two processes that most contribute
     to EVOLUTION are crossover and fitness based  selection/reproduction.

     As it turns out, there are mathematical proofs that indicate that the
     process of FITNESS  proportionate  REPRODUCTION  is,  in  fact,  near
     optimal in some senses.

     MUTATION  also  plays  a role in this process, although how important
     its role is continues to ba a matter of debate (some refer to it as a
     backgroud operator, while others view it as playing the dominant role
     in the evolutionary process). It cannot be stressed too strongly that
     the GENETIC ALGORITHM (as a SIMULATION of a genetic process) is not a
     random search for a solution to a problem  (highly  fit  INDIVIDUAL).
     The  genetic  algorithm  uses stochastic processes, but the result is
     distinctly non-random (better than random).

     GENETIC ALGORITHMs are used for a  number  of  different  application
     areas.  An  example  of  this  would be multidimensional OPTIMIZATION
     problems in which the character string of the CHROMOSOME can be  used
     to encode the values for the different parameters being optimized.

     In  practice,  therefore,  we  can  implement  this  genetic model of
     computation by having arrays of bits or characters to  represent  the
     CHROMOSOMEs.    Simple   bit   manipulation   operations   allow  the
     implementation of CROSSOVER, MUTATION and other operations.  Although
     a  substantial  amount  of  research  has been performed on variable-
     length strings and  other  structures,  the  majority  of  work  with
     GENETIC  ALGORITHMs is focussed on fixed-length character strings. We
     should focus on both this aspect of fixed-lengthness and the need  to
     encode the representation of the solution being sought as a character
     string, since these are  crucial  aspects  that  distinguish  GENETIC
     PROGRAMMING,  which  does  not have a fixed length representation and
     there is typically no encoding of the problem.

     When the GENETIC ALGORITHM is implemented it is  usually  done  in  a
     manner  that  involves  the following cycle:  Evaluate the FITNESS of
     all of the INDIVIDUALs in the POPULATION.  Create a new population by
     performing   operations   such  as  CROSSOVER,  fitness-proportionate
     REPRODUCTION and MUTATION on the individuals whose fitness  has  just
     been  measured.  Discard the old population and iterate using the new
     population.

     One iteration of this loop is referred to as a GENERATION.  There  is
     no  theoretical  reason for this as an implementation model.  Indeed,
     we do not see this punctuated behavior in POPULATIONs in nature as  a
     whole, but it is a convenient implementation model.

     The  first  GENERATION  (generation  0) of this process operates on a
     POPULATION of randomly generated INDIVIDUALs.   From  there  on,  the
     genetic  operations,  in concert with the FITNESS measure, operate to
     improve the population.

 PSEUDO CODE
     Algorithm GA is

	  // start with an initial time
	  t := 0;

	  // initialize a usually random population of individuals
	  initpopulation P (t);

	  // evaluate fitness of all initial individuals of population
	  evaluate P (t);

	  // test for termination criterion (time, fitness, etc.)
	  while not done do

	       // increase the time counter
	       t := t + 1;

	       // select a sub-population for offspring production
	       P' := selectparents P (t);

	       // recombine the "genes" of selected parents
	       recombine P' (t);

	       // perturb the mated population stochastically
	       mutate P' (t);

	       // evaluate it's new fitness
	       evaluate P' (t);

	       // select the survivors from actual fitness
	       P := survive P,P' (t);
	  od
     end GA.

------------------------------

Subject: Q1.2: What's Evolutionary Programming (EP)?

  Introduction
     EVOLUTIONARY PROGRAMMING, originally conceived by Lawrence J.   Fogel
     in  1960,  is  a  stochastic OPTIMIZATION strategy similar to GENETIC
     ALGORITHMs, but instead places emphasis  on  the  behavioral  linkage
     between  PARENTS  and their OFFSPRING, rather than seeking to emulate
     specific GENETIC  OPERATORS  as  observed  in  nature.   EVOLUTIONARY
     PROGRAMMING  is  similar  to  EVOLUTION  STRATEGIES, although the two
     approaches developed independently (see below).

     Like both ES and GAs, EP is a  useful  method  of  OPTIMIZATION  when
     other  techniques  such  as  gradient  descent  or direct, analytical
     discovery are not possible.  Combinatoric  and  real-valued  FUNCTION
     OPTIMIZATION  in  which the OPTIMIZATION surface or FITNESS landscape
     is "rugged", possessing many  locally  optimal  solutions,  are  well
     suited for EVOLUTIONARY PROGRAMMING.

  History
     The  1966 book, "Artificial Intelligence Through Simulated Evolution"
     by Fogel,  Owens  and  Walsh  is  the  landmark  publication  for  EP
     applications,  although  many  other  papers  appear  earlier  in the
     literature.  In the book,  finite  state  automata  were  evolved  to
     predict  symbol  strings  generated  from  Markov  processes and non-
     stationary time series.  Such evolutionary prediction  was  motivated
     by  a  recognition  that  prediction  is  a  keystone  to intelligent
     behavior  (defined  in  terms  of  adaptive  behavior,  in  that  the
     intelligent  organism  must  anticipate  events  in  order  to  adapt
     behavior in light of a goal).

     In 1992, the First Annual Conference on EVOLUTIONARY PROGRAMMING  was
     held  in  La  Jolla, CA.  Further conferences have been held annually
     (See Q12).  The conferences attract  a  diverse  group  of  academic,
     commercial  and  military  researchers engaged in both developing the
     theory of the EP technique and in applying EP  to  a  wide  range  of
     OPTIMIZATION problems, both in engineering and biology.

     Rather   than  list  and  analyze  the  sources  in  detail,  several
     fundamental sources are listed  below  which  should  serve  as  good
     pointers to the entire body of work in the field.

  The Process
     For  EP,  like  GAs, there is an underlying assumption that a FITNESS
     landscape can be characterized in terms of variables, and that  there
     is  an  optimum  solution (or multiple such optima) in terms of those
     variables.  For example, if one were trying to find the shortest path
     in  a Traveling Salesman Problem, each solution would be a path.  The
     length of the path could be expressed as a number, which would  serve
     as  the  solution's  fitness.  The fitness landscape for this problem
     could be characterized as a hypersurface  proportional  to  the  path
     lengths  in a space of possible paths.  The goal would be to find the
     globally shortest path in that space, or more  practically,  to  find
     very short tours very quickly.
     The  basic  EP  method involves 3 steps (Repeat until a threshold for
     iteration is exceeded or an adequate solution is obtained):

     (1)  Choose an initial POPULATION of trial solutions at  random.  The
	  number  of  solutions  in a population is highly relevant to the
	  speed of OPTIMIZATION, but no definite answers are available  as
	  to  how  many  solutions are appropriate (other than >1) and how
	  many solutions are just wasteful.

     (2)  Each solution is replicated into  a  new  POPULATION.   Each  of
	  these   OFFSPRING   solutions   are   mutated   according  to  a
	  distribution of MUTATION types, ranging from  minor  to  extreme
	  with  a  continuum  of  mutation types between.  The severity of
	  MUTATION is judged on the basis of the functional change imposed
	  on the PARENTS.

     (3)  Each  OFFSPRING  solution is assessed by computing it's FITNESS.
	  Typically, a  stochastic  tournament  is  held  to  determine  N
	  solutions  to  be  retained  for  the  POPULATION  of solutions,
	  although  this  is  occasionally  performed   deterministically.
	  There  is  no  requirement  that  the  POPULATION  SIZE  be held
	  constant, however, nor that only a single OFFSPRING be generated
	  from each PARENT.

     It should be pointed out that EP typically does not use any CROSSOVER
     as a GENETIC OPERATOR.

  EP and GAs
     There are two important ways in which EP differs from GAs.

     First, there is no constraint on the representation.  The typical  GA
     approach  involves  encoding  the  problem  solutions  as a string of
     representative tokens, the GENOME.  In EP, the representation follows
     from  the  problem.   A neural network can be represented in the same
     manner as it  is  implemented,  for  example,  because  the  MUTATION
     operation  does  not  demand a linear encoding.  (In this case, for a
     fixed topology, real- valued weights could be coded directly as their
     real  values and mutation operates by perturbing a weight vector with
     a  zero  mean  multivariate  Gaussian  perturbation.   For   variable
     topologies,  the  architecture is also perturbed, often using Poisson
     distributed additions and deletions.)

     Second, the MUTATION operation simply changes aspects of the solution
     according   to   a   statistical  distribution  which  weights  minor
     variations in the behavior of the OFFSPRING as  highly  probable  and
     substantial   variations  as  increasingly  unlikely.   Further,  the
     severity of MUTATIONS is often  reduced  as  the  global  optimum  is
     approached.  There is a certain tautology here: if the global optimum
     is not already known, how can the spread of the mutation operation be
     damped  as  the  solutions approach it?  Several techniques have been
     proposed and implemented which  address  this  difficulty,  the  most
     widely  studied  being the "Meta-Evolutionary" technique in which the
     variance of the mutation distribution is subject  to  mutation  by  a
     fixed variance mutation operator and evolves along with the solution.

  EP and ES
     The first communication  between  the  EVOLUTIONARY  PROGRAMMING  and
     EVOLUTION  STRATEGY  groups occurred in early 1992, just prior to the
     first annual EP conference.  Despite  their  independent  development
     over  30  years,  they  share many similarities.  When implemented to
     solve real-valued  FUNCTION  OPTIMIZATION  problems,  both  typically
     operate  on the real values themselves (rather than any coding of the
     real values as  is  often  done  in  GAs).   Multivariate  zero  mean
     Gaussian  MUTATIONs  are applied to each PARENT in a POPULATION and a
     SELECTION mechanism is applied to determine which solutions to remove
     (i.e.,  "cull")  from the population.  The similarities extend to the
     use  of  self-adaptive  methods  for  determining   the   appropriate
     mutations  to  use -- methods in which each PARENT carries not only a
     potential solution to the problem at hand, but  also  information  on
     how   it  will  distribute  new  trials  (OFFSPRING).   Most  of  the
     theoretical results on CONVERGENCE  (both  asymptotic  and  velocity)
     developed for ES or EP also apply directly to the other.

     The main differences between ES and EP are:

     1.   SELECTION:   EP   typically  uses  STOCHASTIC  SELECTION  via  a
	  tournament.   Each  trial  SOLUTION  in  the  POPULATION   faces
	  competition  against  a  preselected  number  of  opponents  and
	  receives a "win" if it is at least as good as  its  opponent  in
	  each  encounter.  SELECTION then eliminates those SOLUTIONS with
	  the least wins.  In contrast, ES  typically  uses  deterministic
	  SELECTION  in  which  the  worst  SOLUTIONS  are purged from the
	  POPULATION based directly on their function evaluation.

     2.   RECOMBINATION: EP is an abstraction of EVOLUTION at the level of
	  reproductive   POPULATIONs   (i.e.,   SPECIES)   and   thus   no
	  RECOMBINATION   mechanisms   are    typically    used    because
	  RECOMBINATION does not occur between SPECIES (by definition: see
	  Mayr's biological species  concept).   In  contrast,  ES  is  an
	  abstraction  of  EVOLUTION  at the level of INDIVIDUAL behavior.
	  When self-adaptive information is incorporated  this  is  purely
	  GENETIC  information  (as  opposed  to PHENOTYPIC) and thus some
	  forms  of  RECOMBINATION  are  reasonable  and  many  forms   of
	  RECOMBINATION  have  been  implemented  within  ES.   Again, the
	  effectiveness of such operators depends on the problem at  hand.

  References
     Some  references which provide an excellent introduction (by no means
     extensive) to the field, include:

     ARTIFICIAL  INTELLIGENCE  Through   Simulated   EVOLUTION   [Fogel66]
     (primary)

     Fogel DB (1995) "Evolutionary Computation: Toward a New Philosophy of
     Machine Intelligence," IEEE Press, Piscataway, NJ, forthcoming.

     Proceeding of the first [EP92], second [EP93] and third [EP94] Annual
     Conference on EVOLUTIONARY PROGRAMMING (primary) (See Q12).

 PSEUDO CODE
     Algorithm EP is

	  // start with an initial time
	  t := 0;

	  // initialize a usually random population of individuals
	  initpopulation P (t);

	  // evaluate fitness of all initial individuals of population
	  evaluate P (t);

	  // test for termination criterion (time, fitness, etc.)
	  while not done do

	       // perturb the whole population stochastically
	       P'(t) := mutate P (t);

	       // evaluate it's new fitness
	       evaluate P' (t);

	       // stochastically select the survivors from actual fitness
	       P(t+1) := survive P(t),P'(t);

	       // increase the time counter
	       t := t + 1;

	  od
     end EP.

     [Eds note: An extended version of this introduction is available from
     ENCORE (see Q15.3) in /FAQ/supplements/what-is-ep ]

------------------------------

Subject: Q1.3: What's an Evolution Strategy (ES)?

     In 1963 two students at the Technical University of Berlin (TUB)  met
     and  were  soon  to  collaborate  on  experiments which used the wind
     tunnel of the Institute of Flow Engineering.  During the  search  for
     the  optimal  shapes  of bodies in a flow, which was then a matter of
     laborious  intuitive  experimentation,  the  idea  was  conceived  of
     proceeding  strategically.  However, attempts with the coordinate and
     simple gradient strategies (cf Q5) were unsuccessful.   Then  one  of
     the   students,   Ingo  Rechenberg,  now  Professor  of  Bionics  and
     Evolutionary Engineering, hit upon the idea of trying random  changes
     in  the  parameters  defining  the  shape,  following  the example of
     natural  MUTATIONs.   The  EVOLUTION  STRATEGY  was  born.   A  third
     student,  Peter  Bienert, joined them and started the construction of
     an automatic experimenter, which would work according to  the  simple
     rules  of  mutation  and  SELECTION.   The  second student, Hans-Paul
     Schwefel, set about testing the efficiency of the  new  methods  with
     the  help of a Zuse Z23 computer; for there were plenty of objections
     to these "random strategies."

     In spite of an occasional lack of financial support, the Evolutionary
     Engineering  Group  which  had been formed held firmly together. Ingo
     Rechenberg received  his  doctorate  in  1970  (Rechenberg  73).   It
     contains  the  theory  of  the  two membered EVOLUTION STRATEGY and a
     first proposal for a multimembered strategy which in the nomenclature
     introduced  here  is  of the (m+1) type.   In the same year financial
     support from  the  Deutsche  Forschungsgemeinschaft  (DFG,  Germany's
     National Science Foundation) enabled the work, that was concluded, at
     least temporarily, in 1974 with the thesis  "Evolutionsstrategie  und
     numerische Optimierung" (Schwefel 77).

     Thus,   EVOLUTION   STRATEGIEs   were  invented  to  solve  technical
     OPTIMIZATION  problems  (TOPs)  like  e.g.  constructing  an  optimal
     flashing  nozzle,  and  until  recently  ES  were only known to civil
     engineering folks, as an alternative to standard solutions.   Usually
     no  closed  form  analytical objective function is available for TOPs
     and  hence,  no  applicable  optimization  method  exists,  but   the
     engineer's intuition.

     The  first  attempts  to imitate principles of organic EVOLUTION on a
     computer still resembled the iterative OPTIMIZATION methods known  up
     to  that  time  (cf  Q5):   In a two-membered or (1+1) ES, one PARENT
     generates  one  OFFSPRING  per  GENERATION   by   applying   NORMALLY
     DISTRIBUTED  MUTATIONs, i.e. smaller steps occur more likely than big
     ones, until a child performs better than its ancestor and  takes  its
     place.  Because  of  this  simple  structure, theoretical results for
     STEPSIZE control and CONVERGENCE VELOCITY could be derived. The ratio
     between  successful  and  all  mutations  should come to 1/5: the so-
     called 1/5 SUCCESS RULE was discovered. This first  algorithm,  using
     mutation  only,  has  then  been  enhanced  to a (m+1) strategy which
     incorporated RECOMBINATION due  to  several,  i.e.  m  parents  being
     available.  The  mutation  scheme  and the exogenous stepsize control
     were taken across unchanged from  (1+1) ESs.

     Schwefel later generalized these strategies to the  multimembered  ES
     now  denoted  by  (m+l)  and (m,l) which imitates the following basic
     principles  of  organic  EVOLUTION:  a  POPULATION,  leading  to  the
     possibility   of  RECOMBINATION  with  random  mating,  MUTATION  and
     SELECTION.  These strategies  are  termed  PLUS  STRATEGY  and  COMMA
     STRATEGY,  respectively: in the plus case, the parental GENERATION is
     taken into account during selection, while in the comma case only the
     OFFSPRING  undergoes selection, and the PARENTs die off. m (usually a
     lowercase mu, denotes the population size, and l, usually a lowercase
     lambda denotes the number of offspring generated per generation).  Or
     to put  it  in  an  utterly  insignificant  and  hopelessly  outdated
     language:

	  (define (Evolution-strategy population)
	    (if (terminate? population)
	      population
	      (evolution-strategy
		(select
		  (cond (plus-strategy?
			  (union (mutate
				   (recombine population))
				 population))
			(comma-strategy?
			  (mutate
			    (recombine population))))))))

     However,  dealing  with ES is sometimes seen as "strong tobacco," for
     it takes a decent amount of probability theory and applied STATISTICS
     to understand the inner workings of an ES, while it navigates through
     the  hyperspace  of  the  usually  n-dimensional  problem  space,  by
     throwing hyperelipses into the deep...

     Luckily,  this  medium  doesn't allow for much mathematical ballyhoo;
     the author therefore has to come up with  a  simple  but  brilliantly
     intriguing  explanation to save the reader from falling asleep during
     the rest of this section, so here we go:

     Imagine a black box. A large black box. As large as, say for example,
     a Coca-Cola vending machine. Now, [..] (to be continued)

     A  single  INDIVIDUAL of the ES' POPULATION consists of the following
     GENOTYPE representing a point in the SEARCH SPACE:

     OBJECT VARIABLES
	  Real-valued x_i have to be tuned by RECOMBINATION  and  MUTATION
	  such  that  an  objective  function  reaches its global optimum.
	  Referring  to  the  metaphor  mentioned  previously,   the   x_i
	  represent the regulators of the alien Coka-Cola vending machine.

     STRATEGY VARIABLEs
	  Real-valued s_i (usually denoted by a lowercase sigma)  or  mean
	  STEPSIZEs  determine  the  mutability of the x_i. They represent
	  the STANDARD DEVIATION of a  (0, s_i) GAUSSIAN DISTRIBUTION (GD)
	  being  added  to  each  x_i  as an undirected MUTATION.  With an
	  "expectancy value" of 0  the  PARENTs  will  produce  OFFSPRINGs
	  similar  to  themselves  on average. In order to make a doubling
	  and a halving of a stepsize equally  probable,  the  s_i  mutate
	  log-normally,  distributed,  i.e.   exp(GD),  from GENERATION to
	  generation.   These  stepsizes  hide  the  internal  model   the
	  POPULATION  has  made of its ENVIRONMENT, i.e. a SELF-ADAPTATION
	  of the stepsizes has replaced the exogenous control of the (1+1)
	  ES.

	  This  concept  works  because  SELECTION sooner or later prefers
	  those INDIVIDUALs having built a good  model  of  the  objective
	  function,  thus  producing  better  OFFSPRING.   Hence, learning
	  takes place on two levels: (1) at the genotypic, i.e. the object
	  and  STRATEGY  VARIABLE  level  and (2) at the phenotypic level,
	  i.e. the FITNESS level.

	  Depending  on  an  individual's  x_i,  the  resulting  objective
	  function  value  f(x),  where  x denotes the vector of objective
	  variables, serves as the PHENOTYPE (FITNESS)  in  the  SELECTION
	  step.  In  a  PLUS STRATEGY, the m best of all (m+l) INDIVIDUALs
	  survive to become the PARENTs of the next GENERATION.  Using the
	  comma variant, selection takes place only among the l OFFSPRING.
	  The  second  scheme  is  more  realistic  and   therefore   more
	  successful,  because  no  individual  may survive forever, which
	  could at least  theoretically  occur  using  the  plus  variant.
	  Untypical for conventional OPTIMIZATION algorithms and lavish at
	  first   sight,   a   COMMA   STRATEGY   allowing    intermediate
	  deterioration  performs  better!  Only  by forgetting highly fit
	  individuals can a permanent adaptation  of  the  STEPSIZEs  take
	  place  and avoid long stagnation phases due to misadapted s_i's.
	  This means that these individuals have built an  internal  model
	  that  is  no  longer  appropriate for further progress, and thus
	  should better be discarded.

	  By  choosing  a  certain  ratio  m/l,  one  can  determine   the
	  convergence  property  of the EVOLUTION STRATEGY: If one wants a
	  fast, but local convergence, one  should  choose  a  small  HARD
	  SELECTION,  ratio,  e.g.  (5,100),  but  looking  for the global
	  optimum, one should favour a softer SELECTION (15,100).

	  SELF-ADAPTATION within  ESs  depends  on  the  following  agents
	  (Schwefel 87):

     Randomness:
	  One cannot model MUTATION as a purely random process. This would
	  mean that a child is completely independent of its PARENTs.

     POPULATION
	  size: The POPULATION has to be sufficiently large. Not only  the
	  current  best  should be allowed to reproduce, but a set of good
	  INDIVIDUALs.   Biologists  have  coined  the   term   "requisite
	  variety"  to  mean  the  genetic  variety necessary to prevent a
	  SPECIES  from  becoming  poorer  and  poorer   genetically   and
	  eventually dying out.

     COOPERATION:
	  In  order  to  exploit  the effects of a POPULATION (m > 1), the
	  INDIVIDUALs should recombine their knowledge with that of others
	  (cooperate)   because   one   cannot  expect  the  knowledge  to
	  accumulate in the best individual only.

     Deterioration:
	  In order to allow better internal models (STEPSIZEs) to  provide
	  better  progress  in the future, one should accept deterioration
	  from one GENERATION to the next. A limited life-span  in  nature
	  is not a sign of failure, but an important means of preventing a
	  SPECIES from freezing genetically.

	  ESs prove to be successful  when  compared  to  other  iterative
	  methods  on a large number of test problems (Schwefel 77).  They
	  are adaptable to nearly all sorts of problems  in  OPTIMIZATION,
	  because  they  need  very  little information about the problem,
	  esp. no derivatives of the objective function.  For  a  list  of
	  some  300 applications of EAs, see the SyS-2/92 report (cf Q14).
	  ESs  are  capable  of  solving  high  dimensional,   multimodal,
	  nonlinear   problems   subject   to   linear   and/or  nonlinear
	  constraints.  The objective  function  can  also,  e.g.  be  the
	  result of a SIMULATION, it does not have to be given in a closed
	  form.  This also holds for the constraints which  may  represent
	  the  outcome  of,  e.g. a finite elements method (FEM). ESs have
	  been adapted to VECTOR OPTIMIZATION problems (Kursawe  92),  and
	  they can also serve as a heuristic for NP-complete combinatorial
	  problems like the TRAVELLING SALESMAN PROBLEM or problems with a
	  noisy or changing response surface.

	  References

	  Kursawe,   F.   (1992)   "   Evolution   strategies  for  vector
	  optimization", Taipei, National Chiao Tung University,  187-193.

	  Kursawe,  F.  (1994)  "  Evolution  strategies: Simple models of
	  natural processes?", Revue Internationale de Systemique,  France
	  (to appear).

	  Rechenberg,    I.   (1973)   "Evolutionsstrategie:   Optimierung
	  technischer Systeme nach Prinzipien der biologischen Evolution",
	  Stuttgart: Fromman-Holzboog.

	  Schwefel,    H.-P.    (1977)    "Numerische    Optimierung   von
	  Computermodellen  mittels   der   Evolutionsstrategie",   Basel:
	  Birkhaeuser.

	  Schwefel,  H.-P.  (1987)  "Collective Phaenomena in Evolutionary
	  Systems", 31st Annu.  Meet.  Inter'l  Soc.  for  General  System
	  Research, Budapest, 1025-1033.

------------------------------

Subject: Q1.4: What's a Classifier System (CFS)?

 The name of the Game
     First,  a word on naming conventions is due, for no other paradigm of
     EC has undergone more changes to  it's  name  space  than  this  one.
     Initially,  Holland  called his cognitive models "Classifier Systems"
     abbrv. with CS, and sometimes CFS, as can be found in [GOLD89].

     Whence Riolo came into play in 1986 and Holland added a reinforcement
     component to the overall design of a CFS, that emphasized its ability
     to learn. So, the word "learning" was prepended to the name, to make:
     "Learning  Classifier Systems" (abbrv. to LCS).  On October 6-9, 1992
     the "1st Inter'l Workshop on Learning Classifier Systems" took  place
     at  the  NASA  Johnson  Space Center, Houston, TX.  A summary of this
     "summit" of all leading researchers in LCS can  be  found  on  ENCORE
     (See Q15.3) in file CFS/papers/lcs92.ps.gz

     Today, the story continues, LCSs are sometimes subsumed under a "new"
     machine  learning   paradigm   called   "Evolutionary   Reinforcement
     Learning" or ERL for short, incorporating LCSs, "Q-Learning", devised
     by Watkins (1989), and a paradigm of the same name, devised by Ackley
     and Littman [ALIFEIII].

 On Schema Processors and ANIMATS
     So,  to  get  back  to the question above, "What are CFSs?", we first
     might answer, "Well, there  are  many  interpretations  of  Holland's
     ideas...what do you like to know in particular?"  And then we'd start
     with a recitation from [HOLLAND75,92], and  explain  all  the  SCHEMA
     processors,  the  broadcast  language, etc.  But, we will take a more
     comprehensive,  and  intuitive  way  to  understand  what  CLASSIFIER
     SYSTEMs are all about.

     The easiest road to explore the very nature of CLASSIFIER SYSTEMs, is
     to take the animat (ANIMAl + ROBOT = ANIMAT) "lane" devised by Booker
     (1982)  and  later  studied  extensively  by  Wilson (1985), who also
     coined the term for this approach. Work continues on animats  but  is
     often   regarded   as   ARTIFICIAL   LIFE  rather  than  EVOLUTIONARY
     COMPUTATION.  This thread of research has  even  its  own  conference
     series:  "Simulation  of Adaptive Behavior (SAB)" (cf Q12), including
     other  notions  from  machine   learning,   connectionist   learning,
     evolutionary robotics, etc.  [NB: the latter is obvious, if an animat
     lives in a digital microcosm, it can be put into the  real  world  by
     implantation   into   an   autonomous   robot   vehicle,   that   has
     sensors/detectors  (camera  eyes,  whiskers,  etc.)   and   effectors
     (wheels,  robot  arms,  etc.);  so  all  that's  needed is to use our
     algorithm as the "brain" of this  vehicle,  connecting  the  hardware
     parts  with the software learning component.  For a fascinating intro
     to the field see, e.g. Braitenberg (1984)]

     CLASSIFIER SYSTEMs,  however,  are  yet  another  offspring  of  John
     Holland's  aforementioned  book,  and can be seen as one of the early
     applications of GAs, for CFSs  use  this  evolutionary  algorithm  to
     adapt  their  behavior toward a changing ENVIRONMENT, as is explained
     below in greater detail.

     Holland envisioned a cognitive  system  capable  of  classifying  the
     goings  on  in  its ENVIRONMENT, and then reacting to these goings on
     appropriately. So what is needed to build such a  system?  Obviously,
     we  need (1) an environment; (2) receptors that tell our system about
     the goings on; (3) effectors, that  let  our  system  manipulate  its
     environment; and (4) the system itself, conveniently a "black box" in
     this first approach, that has (2) and (3) attached to it, and "lives"
     in (1).

     In  the  animat  approach,  (1)  usually  is  an artificially created
     digital world, e.g. in Booker's Gofer system, a  2  dimensional  grid
     that  contains "food" and "poison".  And the Gofer itself, that walks
     across this grid and tries (a) to learn to distinguish between  these
     two items, and (b) survive well fed.

     Much  the  same  for  Wilson's  animat, called "*". Yes, it's just an
     asterisk, and a "Kafka-esque naming policy" is one of the sign  posts
     of  the  whole  field;  e.g.  the first implementation by Holland and
     Reitmann 1978 was called CS-1, (cognitive system  1);  Smith's  Poker
     player  LS-1  (1980)  followed  this  "convention".  Riolo's 1988 LCS
     implementations on top of his CFS-C library  (cf  Q20),  were  dubbed
     FSW-1 (Finite State World 1), and LETSEQ-1 (LETter SEQuence predictor
     1).

     So from the latter paragraph we can  conclude  that  ENVIRONMENT  can
     also  mean something completely different (e.g. an infinite stream of
     letters, time serieses,  etc.)  than  in  the  animat  approach,  but
     anyway; we'll stick to it, and go on.

     Imagine  a  very  simple  animat,  e.g. a simplified model of a frog.
     Now, we know that frogs live in  (a)  Muppet  Shows,  or  (b)  little
     ponds;  so  we  chose the latter as our demo ENVIRONMENT (1); and the
     former for a non-Kafka-esque name of  our  model,  so  let's  dub  it
     "Kermit".

     Kermit  has eyes, i.e. sensorial input detectors (2); hands and legs,
     i.e.   environment-manipulating  effectors  (3);  is   a   spicy-fly-
     detecting-and-eating  device,  i.e.  a  frog (4); so we got all the 4
     pieces needed.

 Inside the Black Box
     The most primitive "black box" we can think of is a computer.  It has
     inputs  (2), and outputs (3), and a message passing system inbetween,
     that converts (i.e., computes), certain input  messages  into  output
     messages,  according  to a set of rules, usually called the "program"
     of that computer.  From the theory of computer science, we now borrow
     the  simplest  of  all  program  structures, that is something called
     "production system" or PS for short.  A  PS  has  been  shown  to  be
     computationally  complete  by Post (1943), that's why it is sometimes
     called a "Post System", and later  by  Minsky  (1967).   Although  it
     merely consists of a set of if-then rules, it still resembles a full-
     fledged computer.

     We now term a single "if-then" rule  a  "classifier",  and  choose  a
     representation that makes it easy to manipulate these, for example by
     encoding  them  into  binary  strings.   We  then  term  the  set  of
     classifiers,  a  "classifier population", and immediately know how to
     breed new rules for our  system:  just  use  a  GA  to  generate  new
     rules/classifiers from the current POPULATION!

     All  that's  left  are  the  messages floating through the black box.
     They should also be simple strings of zeroes and ones, and are to  be
     kept in a data structure, we call "the message list".

     With  all  this  given, we can imagine the goings on inside the black
     box as follows: The input interface (2) generates messages, i.e., 0/1
     strings,  that  are  written on the message list. Then these messages
     are matched against the condition-part of all  classifiers,  to  find
     out  which  actions  are  to  be triggered.  The message list is then
     emptied, and the  encoded  actions,  themselves  just  messages,  are
     posted  to  the  message list.  Then, the output interface (3) checks
     the message list for messages concerning the effectors. And the cycle
     restarts.

     Note, that it is possible in this set-up to have "internal messages",
     because the message list is not emptied after (3) has checked;  thus,
     the  input  interface messages are added to the initially empty list.
     (cf Algorithm CFS, LCS below)

     The general idea of the CFS is to  start  from  scratch,  i.e.,  from
     tabula  rasa  (without  any  knowledge)  using  a  randomly generated
     classifier POPULATION, and  let  the  system  learn  its  program  by
     induction, (cf Holland et al. 1986), this reduces the input stream to
     recurrent input patterns, that must be repeated over and over  again,
     to  enable  the  animat to classify its current situation/context and
     react on the goings on appropriately.

 What does it need to be a frog?
     Let's take a look at the behavior emitted by Kermit. It lives in  its
     digital  microwilderness  where  it moves around randomly.  [NB: This
     seemingly "random" behavior is not that random at all;  for  more  on
     instinctive,  i.e.,  innate  behavior  of non-artificial animals see,
     e.g.  Tinbergen (1951)]

     Whenever a small flying object appears, that has no  stripes,  Kermit
     should  eat it, because it's very likely a spicy fly, or other flying
     insect.  If it has stripes, the insect is better left alone,  because
     Kermit had better not bother with wasps, hornets, or bees.  If Kermit
     encounters a large, looming object, it immediately uses its effectors
     to jump away, as far as possible.

     So,  part  of  these behavior patterns within the "pond world", in AI
     sometimes called a "frame," from traditional knowledge representation
     techniques, Rich (1983), can be expressed in a set of "if <condition>
     then <action>" rules, as follows:

	  IF small, flying object to the left THEN send @
	  IF small, flying object to the right THEN send %
	  IF small, flying object centered THEN send $
	  IF large, looming object THEN send !
	  IF no large, looming object THEN send *
	  IF * and @ THEN move head 15 degrees left
	  IF * and % THEN move head 15 degrees right
	  IF * and $ THEN move in direction head pointing
	  IF ! THEN move rapidly away from direction head pointing

     Now, this set of rules has to be encoded for use within a  CLASSIFIER
     SYSTEM.   A  possible encoding of the above rule set in CFS-C (Riolo)
     classifier  terminology.  The  condition   part   consists   of   two
     conditions,  that  are  combined with a logical AND, thus must be met
     both to trigger the associated action. This  structure  needs  a  NOT
     operation,  (so  we  get NAND, and know from hardware design, that we
     can build any computer solely with NANDs), in CFS-C this  is  denoted
     by the `~' prefix character (rule #5).

	  IF             THEN
	   0000,  00 00  00 00
	   0000,  00 01  00 01
	   0000,  00 10  00 10
	   1111,  01 ##  11 11
	  ~1111,  01 ##  10 00
	   1000,  00 00  01 00
	   1000,  00 01  01 01
	   1000,  00 10  01 10
	   1111,  ## ##  01 11

     Obviously,  string `0000' denotes small, and `00' in the fist part of
     the second column, denotes flying.  The last two bits  of  column  #2
     encode  the  direction  of  the  object approaching, where `00' means
     left, `01' means right, etc.

     In rule #4 a the "don't care symbol" `#' is used,  that  matches  `1'
     and  `0',  i.e.,  the  position  of  the  large,  looming  object, is
     completely  arbitrary.  A  simple  fact,  that  can   save   Kermit's
     (artificial) life.

 PSEUDO CODE (Non-Learning CFS)
     Algorithm CFS is

	  // start with an initial time
	  t := 0;

	  // an initially empty message list
	  initMessageList ML (t);

	  // and a randomly generated population of classifiers
	  initClassifierPopulation P (t);

	  // test for cycle termination criterion (time, fitness, etc.)
	  while not done do

	       // increase the time counter
	       t := t + 1;

	       // 1. detectors check whether input messages are present
	       ML := readDetectors (t);

	       // 2. compare ML to the classifiers and save matches
	       ML' := matchClassifiers ML,P (t);

	       // 3. process new messages through output interface
	       ML := sendEffectors ML' (t);
	  od
     end CFS.

     To  convert the previous, non-learning CFS into a learning CLASSIFIER
     SYSTEM, LCS, as has been proposed in Holland  (1986),  it  takes  two
     steps:

     (1)   the  major  cycle has to be changed such that the activation of
	   each classifier depends on some additional parameter, that  can
	   be  modified as a result of experience, i.e. reinforcement from
	   the ENVIRONMENT;

     (2)   and/or change  the  contents  of  the  classifier  list,  i.e.,
	   generate   new   classifiers/rules,  by  removing,  adding,  or
	   combining condition/action-parts of existing classifiers.

	   The algorithm thus changes accordingly:

 PSEUDO CODE (Learning CFS)
     Algorithm LCS is

	  // start with an initial time
	  t := 0;

	  // an initially empty message list
	  initMessageList ML (t);

	  // and a randomly generated population of classifiers
	  initClassifierPopulation P (t);

	  // test for cycle termination criterion (time, fitness, etc.)
	  while not done do

	       // increase the time counter
	       t := t + 1;

	       // 1. detectors check whether input messages are present
	       ML := readDetectors (t);

	       // 2. compare ML to the classifiers and save matches
	       ML' := matchClassifiers ML,P (t);

	       // 3. highest bidding classifier(s) collected in ML' wins the
	       // "race" and post the(ir) message(s)
	       ML' := selectMatchingClassifiers ML',P (t);

	       // 4. tax bidding classifiers, reduce their strength
	       ML' := taxPostingClassifiers ML',P (t);

	       // 5. effectors check new message list for output msgs
	       ML := sendEffectors ML' (t);

	       // 6. receive payoff from environment (REINFORCEMENT)
	       C := receivePayoff (t);

	       // 7. distribute payoff/credit to classifiers (e.g. BBA)
	       P' := distributeCredit C,P (t);

	       // 8. Eventually (depending on t), an EA (usually a GA) is
	       // applied to the classifier population
	       if criterion then
		    P := generateNewRules P' (t);
	       else
		    P := P'
	  od
     end LCS.

 What's the problem with CFSs?
     Just to list the currently known problems that come with CFSs,  would
     take  some  additional  pages; therefore only some interesting papers
     dealing with unresolved riddles are listed; probably the  best  paper
     containing  most  of  these  is the aforementioned summary of the LCS
     Workshop:
     Smith, R.E. (1992) "A report on the first Inter'l Workshop  on  LCSs"
     avail. from ENCORE (See Q15.3) in file CFS/papers/lcs92.ps.gz

     Other noteworthy critiques on LCSs include:

     Wilson,  S.W.  (1987)  "Classifier  Systems  and  the Animat Problem"
     Machine Learning, 2.
     Wilson, S.W. (1988) "Bid Competition  and  Specificity  Reconsidered"
     Complex Systems, 2(5):705-723.

     Wilson, S.W. & Goldberg, D.E. (1989) "A critical review of classifier
     systems" [ICGA89], 244-255.

     Goldberg, D.E., Horn, J. & Deb, K. (1992) "What makes a problem  hard
     for  a  classifier system?"  (containing the Goldberg citation below)
     is   also   available   from   ENCORE    (See    Q15.3)    in    file
     CFS/papers/lcs92-2.ps.gz

     Dorigo,  M.  (1993)  "Genetic  and  Non-genetic Operators in ALECSYS"
     Evolutionary Computation, 1(2):151-164.  The  technical  report,  the
     journal article is based on is avail. from ENCORE (See Q15.3) in file
     CFS/papers/icsi92.ps.gz

     Smith, R.E. Forrest,  S.  &  Perelson,  A.S.  (1993)  "Searching  for
     Diverse,    Cooperative    POPULATIONs   with   Genetic   Algorithms"
     Evolutionary Computation, 1(2):127-149.

 Conclusions?
     Generally speaking:
      "There's much to do in CFS research!"

     No other notion of EC provides more space to explore and if  you  are
     interested  in  a  PhD  in the field, you might want to take a closer
     look at CFS.  However, be warned!,  to  quote  Goldberg:  "classifier
     systems   are   a  quagmire---a  glorious,  wondrous,  and  inventing
     quagmire, but a quagmire nonetheless."

     References

     Booker, L.B. (1982) "Intelligent behavior as an adaption to the  task
     environment"  PhD Dissertation, Univ. of Michigan, Logic of Computers
     Group, Ann Arbor, MI.

     Braitenberg,  V.   (1984)   "Vehicles:   Experiments   in   Synthetic
     Psychology" Boston, MA: MIT Press.

     Holland,  J.H.  (1986)  "Escaping  Brittleness:  The possibilities of
     general-purpose learning algorithms applied  to  parallel  rule-based
     systems".  In:  R.S. Michalski, J.G. Carbonell & T.M. Mitchell (eds),
     Machine  Learning:  An  Artificial  Intelligence  approach,  Vol  II,
     593-623, Los Altos, CA: Morgan Kaufman.

     Holland,  J.H.,  et  al.  (1986)  "Induction: Processes of Inference,
     Learning, and Discovery", Cambridge, MA: MIT Press.

     Holland, J.H. (1992) "Adaptation in natural and  artificial  systems"
     Boston, MA: MIT Press.

     Holland,  J.H.  &  Reitman,  J.S.  (1978) "Cognitive Systems based on
     Adaptive Algorithms" In D.A. Waterman & F.Hayes-Roth, (eds)  Pattern-
     directed inference systems. NY: Academic Press.

     Minsky,   M.L.   (1961)   "Steps   toward   Artificial  Intelligence"
     Proceedings IRE, 49, 8-30. Reprinted in E.A. Feigenbaum & J.  Feldman
     (eds) Computers and Thought, 406-450, NY: McGraw-Hill, 1963.

     Minsky,  M.L.  (1967)  "Computation:  Finite  and  Infinite Machines"
     Englewood Cliffs, NJ: Prentice-Hall.

     Post, Emil L. (1943) "Formal reductions of the general  combinatorial
     decision problem" American Journal of Mathematics, 65, 197-215.

     Rich, E. (1983) "Artificial Intelligence" NY: McGraw-Hill.
     Tinbergen,  N. (1951) "The Study of Instinct" NY: Oxford Univ. Press.

     Watkins, C. (1989) "Learning from Delayed Rewards" PhD  Dissertation,
     Department of Psychology, Cambridge Univ., UK.

     Wilson,  S.W.  (1985)  "Knowledge  growth in an artificial animal" in
     [ICGA85], 16-23.

------------------------------

Subject: Q1.5: What's Genetic Programming (GP)?

     GENETIC PROGRAMMING is the extension of the genetic model of learning
     into  the space of programs. That is, the objects that constitute the
     POPULATION  are  not  fixed-length  character  strings  that   encode
     possible  solutions  to  the problem at hand, they are programs that,
     when executed, "are" the candidate solutions to  the  problem.  These
     programs  are expressed in genetic programming as parse trees, rather
     than as lines of code.  Thus, for example, the simple program "a +  b
     * c" would be represented as:

		 +
		/ \
		a  *
		 / \
		 b  c

     or,  to  be  precise,  as suitable data structures linked together to
     achieve this effect. Because this is a very simple thing to do in the
     programming language Lisp, many GPers tend to use Lisp. However, this
     is simply an implementation detail. There are straightforward methods
     to implement GP using a non-Lisp programming environment.

     The  programs  in  the  POPULATION  are composed of elements from the
     FUNCTION SET and the TERMINAL SET, which are typically fixed sets  of
     symbols selected to be appropriate to the solution of problems in the
     domain of interest.

     In GP the CROSSOVER  operation  is  implemented  by  taking  randomly
     selected  subtrees in the INDIVIDUALs (selected according to FITNESS)
     and exchanging them.

     It should be pointed out that GP usually does not use any MUTATION as
     a GENETIC OPERATOR.

     More  information  is  available  in the GP mailing list FAQ.  (See Q
     15.2)

------------------------------

End of ai-faq/genetic/part2
***************************
