Newsgroups: comp.ai.genetic,comp.answers,news.answers
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!udel!news.sprintlink.net!pipex!warwick!yama.mcc.ac.uk!cf-cm!David.Beasley
From: David.Beasley@cm.cf.ac.uk (David Beasley)
Subject: FAQ: comp.ai.genetic part 2/6 (A Guide to Frequently Asked Questions)
Message-ID: <part2_795637026@cm.cf.ac.uk>
Followup-To: comp.ai.genetic
Summary: This is part 2 of a <trilogy> entitled "The Hitch-Hiker's Guide to 
         Evolutionary Computation". A periodically published list of 
         Frequently Asked Questions (and their answers) about Evolutionary 
         Algorithms, Life and Everything. It should be read by anyone who 
         whishes to post to the comp.ai.genetic newsgroup, preferably *before* 
         posting.
Originator: David.Beasley@cm.cf.ac.uk (David Beasley)
Sender: David.Beasley@cm.cf.ac.uk (David Beasley)
Supersedes: <part2_780051892@cm.cf.ac.uk>
Organization: University of Wales College of Cardiff, Cardiff, WALES, UK.
References: <part1_795637026@cm.cf.ac.uk>
Date: Sun, 19 Mar 95 18:18:42 GMT
Approved: news-answers-request@MIT.Edu
Expires: 21 Jun 1995 18:17:06 GMT
Lines: 1051
Xref: glinda.oz.cs.cmu.edu comp.ai.genetic:5334 comp.answers:10711 news.answers:40167

Archive-name:   ai-faq/genetic/part2
Last-Modified:  3/20/95
Issue:          3.1


TABLE OF CONTENTS OF PART 2
     Q1: What are Evolutionary Algorithms (EAs)?
     Q1.1: What's a Genetic Algorithm (GA)?
     Q1.2: What's Evolutionary Programming (EP)?
     Q1.3: What's an Evolution Strategy (ES)?
     Q1.4: What's a Classifier System (CFS)?
     Q1.5: What's Genetic Programming (GP)?

----------------------------------------------------------------------

Subject: Q1: What are Evolutionary Algorithms (EAs)?

     Evolutionary algorithm is an umbrella term used to describe computer-
     based problem solving  systems  which  use  computational  models  of
     evolutionary   processes   as   key  elements  in  their  design  and
     implementation.  A  variety  of  evolutionary  algorithms  have  been
     proposed.   The  major  ones  are:  GENETIC  ALGORITHMs  (see  Q1.1),
     EVOLUTIONARY PROGRAMMING (see Q1.2), EVOLUTION STRATEGIEs (see Q1.3),
     CLASSIFIER  SYSTEMs  (see  Q1.4), and GENETIC PROGRAMMING (see Q1.5).
     They all share a common conceptual base of simulating  the  EVOLUTION
     of  INDIVIDUAL  structures  via processes of SELECTION, MUTATION, and
     REPRODUCTION.  The processes depend on the perceived  PERFORMANCE  of
     the individual structures as defined by an ENVIRONMENT.

     More  precisely, EAs maintain a POPULATION of structures, that evolve
     according to  rules  of  SELECTION  and  other  operators,  that  are
     referred  to  as  "search operators", (or GENETIC OPERATORs), such as
     RECOMBINATION  and  MUTATION.   Each  INDIVIDUAL  in  the  population
     receives  a measure of it's FITNESS in the ENVIRONMENT.  REPRODUCTION
     focuses attention on high fitness individuals, thus  exploiting  (cf.
     EXPLOITATION)  the  available fitness information.  Recombination and
     mutation perturb those individuals, providing general heuristics  for
     EXPLORATION.  Although simplistic from a biologist's viewpoint, these
     algorithms are sufficiently complex to provide  robust  and  powerful
     adaptive search mechanisms.

     --- "An Overview of Evolutionary Computation" [ECML93], 442-459.

 PSEUDO CODE
     Algorithm EA is

	  // start with an initial time
	  t := 0;

	  // initialize a usually random population of individuals
	  initpopulation P (t);

	  // evaluate fitness of all initial individuals in population
	  evaluate P (t);

	  // test for termination criterion (time, fitness, etc.)
	  while not done do

	       // increase the time counter
	       t := t + 1;

	       // select sub-population for offspring production
	       P' := selectparents P (t);

	       // recombine the "genes" of selected parents
	       recombine P' (t);

	       // perturb the mated population stochastically
	       mutate P' (t);

	       // evaluate it's new fitness
	       evaluate P' (t);

	       // select the survivors from actual fitness
	       P := survive P,P' (t);
	  od
     end EA.

------------------------------

Subject: Q1.1: What's a Genetic Algorithm (GA)?

     The  GENETIC  ALGORITHM  is a model of machine learning which derives
     its behavior from a metaphor of the processes of EVOLUTION in nature.
     This  is  done  by  the  creation within a machine of a POPULATION of
     INDIVIDUALs represented by CHROMOSOMEs, in essence a set of character
     strings  that  are analogous to the base-4 chromosomes that we see in
     our own DNA.  The individuals in the population  then  go  through  a
     process of evolution.

     We  should  note that EVOLUTION (in nature or anywhere else) is not a
     purposive or directed process.  That is,  there  is  no  evidence  to
     support  the  assertion  that  the  goal  of  evolution is to produce
     Mankind. Indeed, the  processes  of  nature  seem  to  boil  down  to
     different  INDIVIDUALs  competing  for  resources in the ENVIRONMENT.
     Some are better than others. Those that are better are more likely to
     survive and propagate their genetic material.

     In  nature,  we  see  that  the  encoding for our genetic information
     (GENOME) is done in a way that admits asexual REPRODUCTION  (such  as
     by  budding)  typically  results  in  OFFSPRING  that are genetically
     identical to the PARENT.  Sexual REPRODUCTION allows the creation  of
     genetically  radically different offspring that are still of the same
     general flavor (SPECIES).

     At the molecular level what occurs (wild  oversimplification  alert!)
     is  that a pair of CHROMOSOMEs bump into one another, exchange chunks
     of genetic information and drift apart.  This  is  the  RECOMBINATION
     operation,  which GA/GPers generally refer to as CROSSOVER because of
     the way that genetic material crosses over  from  one  chromosome  to
     another.

     The CROSSOVER operation happens in an ENVIRONMENT where the SELECTION
     of who gets to mate is a function of the FITNESS of  the  INDIVIDUAL,
     i.e.  how  good  the  individual  is at competing in its environment.
     Some GENETIC ALGORITHMs use a simple function of the fitness  measure
     to   select   individuals   (probabilistically)  to  undergo  genetic
     operations such as crossover or asexual REPRODUCTION (the propagation
     of   genetic  material  unaltered).   This  is  fitness-proportionate
     selection.  Other  implementations  use  a  model  in  which  certain
     randomly  selected  individuals in a subgroup compete and the fittest
     is selected. This is called tournament selection and is the  form  of
     selection we see in nature when stags rut to vie for the privilege of
     mating with a herd of hinds. The two processes that  most  contribute
     to  EVOLUTION are crossover and fitness based selection/reproduction.

     As it turns out, there are mathematical proofs that indicate that the
     process  of  FITNESS  proportionate  REPRODUCTION  is,  in fact, near
     optimal in some senses.

     MUTATION also plays a role in this process,  although  how  important
     its role is continues to ba a matter of debate (some refer to it as a
     backgroud operator, while others view it as playing the dominant role
     in the evolutionary process). It cannot be stressed too strongly that
     the GENETIC ALGORITHM (as a SIMULATION of a genetic process) is not a
     random  search  for  a solution to a problem (highly fit INDIVIDUAL).
     The genetic algorithm uses stochastic processes, but  the  result  is
     distinctly non-random (better than random).

     GENETIC  ALGORITHMs  are  used  for a number of different application
     areas. An example of  this  would  be  multidimensional  OPTIMIZATION
     problems  in which the character string of the CHROMOSOME can be used
     to encode the values for the different parameters being optimized.

     In practice, therefore,  we  can  implement  this  genetic  model  of
     computation  by  having arrays of bits or characters to represent the
     CHROMOSOMEs.   Simple   bit   manipulation   operations   allow   the
     implementation  of CROSSOVER, MUTATION and other operations. Although
     a substantial amount of research  has  been  performed  on  variable-
     length  strings  and  other  structures,  the  majority  of work with
     GENETIC ALGORITHMs is focussed on fixed-length character strings.  We
     should  focus on both this aspect of fixed-lengthness and the need to
     encode the representation of the solution being sought as a character
     string,  since  these  are  crucial  aspects that distinguish GENETIC
     PROGRAMMING, which does not have a fixed  length  representation  and
     there is typically no encoding of the problem.

     When  the  GENETIC  ALGORITHM  is implemented it is usually done in a
     manner that involves the following cycle:  Evaluate  the  FITNESS  of
     all of the INDIVIDUALs in the POPULATION.  Create a new population by
     performing  operations  such  as   CROSSOVER,   fitness-proportionate
     REPRODUCTION  and  MUTATION on the individuals whose fitness has just
     been measured. Discard the old population and iterate using  the  new
     population.

     One  iteration of this loop is referred to as a GENERATION.  There is
     no theoretical reason for this as an implementation  model.   Indeed,
     we  do not see this punctuated behavior in POPULATIONs in nature as a
     whole, but it is a convenient implementation model.

     The first GENERATION (generation 0) of this  process  operates  on  a
     POPULATION  of  randomly  generated  INDIVIDUALs.  From there on, the
     genetic operations, in concert with the FITNESS measure,  operate  to
     improve the population.

 PSEUDO CODE
     Algorithm GA is

	  // start with an initial time
	  t := 0;

	  // initialize a usually random population of individuals
	  initpopulation P (t);

	  // evaluate fitness of all initial individuals of population
	  evaluate P (t);

	  // test for termination criterion (time, fitness, etc.)
	  while not done do

	       // increase the time counter
	       t := t + 1;

	       // select a sub-population for offspring production
	       P' := selectparents P (t);

	       // recombine the "genes" of selected parents
	       recombine P' (t);

	       // perturb the mated population stochastically
	       mutate P' (t);

	       // evaluate it's new fitness
	       evaluate P' (t);

	       // select the survivors from actual fitness
	       P := survive P,P' (t);
	  od
     end GA.

------------------------------

Subject: Q1.2: What's Evolutionary Programming (EP)?

  Introduction
     EVOLUTIONARY  PROGRAMMING, originally conceived by Lawrence J.  Fogel
     in 1960, is a stochastic OPTIMIZATION  strategy  similar  to  GENETIC
     ALGORITHMs,  but  instead  places  emphasis on the behavioral linkage
     between PARENTS and their OFFSPRING, rather than seeking  to  emulate
     specific  GENETIC  OPERATORS  as  observed  in  nature.  EVOLUTIONARY
     PROGRAMMING is similar to  EVOLUTION  STRATEGIES,  although  the  two
     approaches developed independently (see below).

     Like  both  ES  and  GAs,  EP is a useful method of OPTIMIZATION when
     other techniques such  as  gradient  descent  or  direct,  analytical
     discovery  are  not  possible.  Combinatoric and real-valued FUNCTION
     OPTIMIZATION in which the OPTIMIZATION surface or  FITNESS  landscape
     is  "rugged",  possessing  many  locally  optimal solutions, are well
     suited for EVOLUTIONARY PROGRAMMING.

  History
     The 1966 book, "Artificial Intelligence Through Simulated  Evolution"
     by  Fogel,  Owens  and  Walsh  is  the  landmark  publication  for EP
     applications, although  many  other  papers  appear  earlier  in  the
     literature.   In  the  book,  finite  state  automata were evolved to
     predict symbol strings  generated  from  Markov  processes  and  non-
     stationary  time  series.  Such evolutionary prediction was motivated
     by a  recognition  that  prediction  is  a  keystone  to  intelligent
     behavior  (defined  in  terms  of  adaptive  behavior,  in  that  the
     intelligent  organism  must  anticipate  events  in  order  to  adapt
     behavior in light of a goal).

     In  1992, the First Annual Conference on EVOLUTIONARY PROGRAMMING was
     held in La Jolla, CA.  Further conferences have  been  held  annually
     (See  Q12).   The  conferences  attract  a diverse group of academic,
     commercial and military researchers engaged in  both  developing  the
     theory  of  the  EP  technique  and in applying EP to a wide range of
     OPTIMIZATION problems, both in engineering and biology.

     Rather  than  list  and  analyze  the  sources  in  detail,   several
     fundamental  sources  are  listed  below  which  should serve as good
     pointers to the entire body of work in the field.

  The Process
     For EP, like GAs, there is an underlying assumption  that  a  FITNESS
     landscape  can be characterized in terms of variables, and that there
     is an optimum solution (or multiple such optima) in  terms  of  those
     variables.  For example, if one were trying to find the shortest path
     in a Traveling Salesman Problem, each solution would be a path.   The
     length  of the path could be expressed as a number, which would serve
     as the solution's fitness.  The fitness landscape  for  this  problem
     could  be  characterized  as  a hypersurface proportional to the path
     lengths in a space of possible paths.  The goal would be to find  the
     globally  shortest  path  in that space, or more practically, to find
     very short tours very quickly.

     The basic EP method involves 3 steps (Repeat until  a  threshold  for
     iteration is exceeded or an adequate solution is obtained):

     (1)  Choose  an  initial POPULATION of trial solutions at random. The
	  number of solutions in a population is highly  relevant  to  the
	  speed  of OPTIMIZATION, but no definite answers are available as
	  to how many solutions are appropriate (other than  >1)  and  how
	  many solutions are just wasteful.

     (2)  Each  solution  is  replicated  into  a new POPULATION.  Each of
	  these  OFFSPRING  solutions   are   mutated   according   to   a
	  distribution  of  MUTATION  types, ranging from minor to extreme
	  with a continuum of mutation types  between.   The  severity  of
	  MUTATION is judged on the basis of the functional change imposed
	  on the PARENTS.

     (3)  Each OFFSPRING solution is assessed by computing  it's  FITNESS.
	  Typically,  a  stochastic  tournament  is  held  to  determine N
	  solutions to  be  retained  for  the  POPULATION  of  solutions,
	  although   this  is  occasionally  performed  deterministically.
	  There is  no  requirement  that  the  POPULATION  SIZE  be  held
	  constant, however, nor that only a single OFFSPRING be generated
	  from each PARENT.

     It should be pointed out that EP typically does not use any CROSSOVER
     as a GENETIC OPERATOR.

  EP and GAs
     There are two important ways in which EP differs from GAs.

     First,  there is no constraint on the representation.  The typical GA
     approach involves encoding the  problem  solutions  as  a  string  of
     representative tokens, the GENOME.  In EP, the representation follows
     from the problem.  A neural network can be represented  in  the  same
     manner  as  it  is  implemented,  for  example,  because the MUTATION
     operation does not demand a linear encoding.  (In this  case,  for  a
     fixed topology, real- valued weights could be coded directly as their
     real values and mutation operates by perturbing a weight vector  with
     a   zero  mean  multivariate  Gaussian  perturbation.   For  variable
     topologies, the architecture is also perturbed, often  using  Poisson
     distributed additions and deletions.)

     Second, the MUTATION operation simply changes aspects of the solution
     according  to  a  statistical  distribution   which   weights   minor
     variations  in  the  behavior of the OFFSPRING as highly probable and
     substantial  variations  as  increasingly  unlikely.   Further,   the
     severity  of  MUTATIONS  is  often  reduced  as the global optimum is
     approached.  There is a certain tautology here: if the global optimum
     is not already known, how can the spread of the mutation operation be
     damped as the solutions approach it?  Several  techniques  have  been
     proposed  and  implemented  which  address  this difficulty, the most
     widely studied being the "Meta-Evolutionary" technique in  which  the
     variance  of  the  mutation  distribution is subject to mutation by a
     fixed variance mutation operator and evolves along with the solution.

  EP and ES
     The  first  communication  between  the  EVOLUTIONARY PROGRAMMING and
     EVOLUTION STRATEGY groups occurred in early 1992, just prior  to  the
     first  annual  EP  conference.  Despite their independent development
     over 30 years, they share many  similarities.   When  implemented  to
     solve  real-valued  FUNCTION  OPTIMIZATION  problems,  both typically
     operate on the real values themselves (rather than any coding of  the
     real  values  as  is  often  done  in  GAs).   Multivariate zero mean
     Gaussian MUTATIONs are applied to each PARENT in a POPULATION  and  a
     SELECTION mechanism is applied to determine which solutions to remove
     (i.e., "cull") from the population.  The similarities extend  to  the
     use   of   self-adaptive  methods  for  determining  the  appropriate
     mutations to use -- methods in which each PARENT carries not  only  a
     potential  solution  to  the problem at hand, but also information on
     how  it  will  distribute  new  trials  (OFFSPRING).   Most  of   the
     theoretical  results  on  CONVERGENCE  (both asymptotic and velocity)
     developed for ES or EP also apply directly to the other.

     The main differences between ES and EP are:

     1.   SELECTION:  EP  typically  uses  STOCHASTIC  SELECTION   via   a
	  tournament.    Each  trial  SOLUTION  in  the  POPULATION  faces
	  competition  against  a  preselected  number  of  opponents  and
	  receives  a  "win"  if it is at least as good as its opponent in
	  each encounter.  SELECTION then eliminates those SOLUTIONS  with
	  the  least  wins.   In contrast, ES typically uses deterministic
	  SELECTION in which the  worst  SOLUTIONS  are  purged  from  the
	  POPULATION based directly on their function evaluation.

     2.   RECOMBINATION: EP is an abstraction of EVOLUTION at the level of
	  reproductive   POPULATIONs   (i.e.,   SPECIES)   and   thus   no
	  RECOMBINATION    mechanisms    are    typically   used   because
	  RECOMBINATION does not occur between SPECIES (by definition: see
	  Mayr's  biological  species  concept).   In  contrast,  ES is an
	  abstraction of EVOLUTION at the level  of  INDIVIDUAL  behavior.
	  When  self-adaptive  information  is incorporated this is purely
	  GENETIC information (as opposed to  PHENOTYPIC)  and  thus  some
	  forms   of  RECOMBINATION  are  reasonable  and  many  forms  of
	  RECOMBINATION have  been  implemented  within  ES.   Again,  the
	  effectiveness  of such operators depends on the problem at hand.

  References
     Some references which provide an excellent introduction (by no  means
     extensive) to the field, include:

     ARTIFICIAL   INTELLIGENCE   Through   Simulated  EVOLUTION  [Fogel66]
     (primary)

     Fogel DB (1995) "Evolutionary Computation: Toward a New Philosophy of
     Machine Intelligence," IEEE Press, Piscataway, NJ, forthcoming.

     Proceeding of the first [EP92], second [EP93] and third [EP94] Annual
     Conference on EVOLUTIONARY PROGRAMMING (primary) (See Q12).

 PSEUDO CODE
     Algorithm EP is

	  // start with an initial time
	  t := 0;

	  // initialize a usually random population of individuals
	  initpopulation P (t);

	  // evaluate fitness of all initial individuals of population
	  evaluate P (t);

	  // test for termination criterion (time, fitness, etc.)
	  while not done do

	       // perturb the whole population stochastically
	       P'(t) := mutate P (t);

	       // evaluate it's new fitness
	       evaluate P' (t);

	       // stochastically select the survivors from actual fitness
	       P(t+1) := survive P(t),P'(t);

	       // increase the time counter
	       t := t + 1;
	  od
     end EP.

     [Eds note: An extended version of this introduction is available from
     ENCORE (see Q15.3) in /FAQ/supplements/what-is-ep ]

------------------------------

Subject: Q1.3: What's an Evolution Strategy (ES)?

     In  1963 two students at the Technical University of Berlin (TUB) met
     and were soon to collaborate  on  experiments  which  used  the  wind
     tunnel  of  the Institute of Flow Engineering.  During the search for
     the optimal shapes of bodies in a flow, which was then  a  matter  of
     laborious  intuitive  experimentation,  the  idea  was  conceived  of
     proceeding strategically.  However, attempts with the coordinate  and
     simple  gradient  strategies  (cf Q5) were unsuccessful.  Then one of
     the  students,  Ingo  Rechenberg,  now  Professor  of   Bionics   and
     Evolutionary  Engineering, hit upon the idea of trying random changes
     in the parameters  defining  the  shape,  following  the  example  of
     natural  MUTATIONs.   The  EVOLUTION  STRATEGY  was  born.   A  third
     student, Peter Bienert, joined them and started the  construction  of
     an  automatic  experimenter, which would work according to the simple
     rules of mutation  and  SELECTION.   The  second  student,  Hans-Paul
     Schwefel,  set  about  testing the efficiency of the new methods with
     the help of a Zuse Z23 computer; for there were plenty of  objections
     to these "random strategies."

     In spite of an occasional lack of financial support, the Evolutionary
     Engineering Group which had been formed held  firmly  together.  Ingo
     Rechenberg  received  his  doctorate  in  1970  (Rechenberg  73).  It
     contains the theory of the two  membered  EVOLUTION  STRATEGY  and  a
     first proposal for a multimembered strategy which in the nomenclature
     introduced here is of the (m+1) type.   In the  same  year  financial
     support  from  the  Deutsche  Forschungsgemeinschaft  (DFG, Germany's
     National Science Foundation) enabled the work, that was concluded, at
     least  temporarily,  in 1974 with the thesis "Evolutionsstrategie und
     numerische Optimierung" (Schwefel 77).

     Thus,  EVOLUTION  STRATEGIEs  were  invented   to   solve   technical
     OPTIMIZATION  problems  (TOPs)  like  e.g.  constructing  an  optimal
     flashing nozzle, and until recently  ES  were  only  known  to  civil
     engineering  folks, as an alternative to standard solutions.  Usually
     no closed form analytical objective function is  available  for  TOPs
     and   hence,  no  applicable  optimization  method  exists,  but  the
     engineer's intuition.

     The first attempts to imitate principles of organic  EVOLUTION  on  a
     computer  still resembled the iterative OPTIMIZATION methods known up
     to that time (cf Q5):  In a two-membered  or  (1+1)  ES,  one  PARENT
     generates   one   OFFSPRING   per  GENERATION  by  applying  NORMALLY
     DISTRIBUTED MUTATIONs, i.e. smaller steps occur more likely than  big
     ones,  until  a child performs better than its ancestor and takes its
     place. Because of this  simple  structure,  theoretical  results  for
     STEPSIZE control and CONVERGENCE VELOCITY could be derived. The ratio
     between successful and all mutations should  come  to  1/5:  the  so-
     called  1/5  SUCCESS RULE was discovered. This first algorithm, using
     mutation only, has then been  enhanced  to  a  (m+1)  strategy  which
     incorporated  RECOMBINATION  due  to  several,  i.e.  m parents being
     available. The mutation scheme and  the  exogenous  stepsize  control
     were taken across unchanged from  (1+1) ESs.

     Schwefel  later  generalized these strategies to the multimembered ES
     now denoted by (m+l) and (m,l) which  imitates  the  following  basic
     principles  of  organic  EVOLUTION:  a  POPULATION,  leading  to  the
     possibility  of  RECOMBINATION  with  random  mating,  MUTATION   and
     SELECTION.   These  strategies  are  termed  PLUS  STRATEGY and COMMA
     STRATEGY, respectively: in the plus case, the parental GENERATION  is
     taken into account during selection, while in the comma case only the
     OFFSPRING undergoes selection, and the PARENTs die off. m (usually  a
     lowercase mu, denotes the population size, and l, usually a lowercase
     lambda denotes the number of offspring generated per generation).  Or
     to  put  it  in  an  utterly  insignificant  and  hopelessly outdated
     language:

	  (define (Evolution-strategy population)
	    (if (terminate? population)
	      population
	      (evolution-strategy
		(select
		  (cond (plus-strategy?
			  (union (mutate
				   (recombine population))
				 population))
			(comma-strategy?
			  (mutate
			    (recombine population))))))))

     However, dealing with ES is sometimes seen as "strong  tobacco,"  for
     it takes a decent amount of probability theory and applied STATISTICS
     to understand the inner workings of an ES, while it navigates through
     the  hyperspace  of  the  usually  n-dimensional  problem  space,  by
     throwing hyperelipses into the deep...

     Luckily, this medium doesn't allow for  much  mathematical  ballyhoo;
     the  author  therefore  has  to come up with a simple but brilliantly
     intriguing explanation to save the reader from falling asleep  during
     the rest of this section, so here we go:

     Imagine a black box. A large black box. As large as, say for example,
     a Coca-Cola vending machine. Now, [..] (to be continued)

     A single INDIVIDUAL of the ES' POPULATION consists of  the  following
     GENOTYPE representing a point in the SEARCH SPACE:

     OBJECT VARIABLES
	  Real-valued  x_i  have to be tuned by RECOMBINATION and MUTATION
	  such that an objective  function  reaches  its  global  optimum.
	  Referring   to   the  metaphor  mentioned  previously,  the  x_i
	  represent the regulators of the alien Coka-Cola vending machine.

     STRATEGY VARIABLEs
	  Real-valued  s_i  (usually denoted by a lowercase sigma) or mean
	  STEPSIZEs determine the mutability of the  x_i.  They  represent
	  the STANDARD DEVIATION of a  (0, s_i) GAUSSIAN DISTRIBUTION (GD)
	  being added to each x_i as  an  undirected  MUTATION.   With  an
	  "expectancy  value"  of  0  the  PARENTs will produce OFFSPRINGs
	  similar to themselves on average. In order to  make  a  doubling
	  and  a  halving  of  a stepsize equally probable, the s_i mutate
	  log-normally, distributed, i.e.   exp(GD),  from  GENERATION  to
	  generation.    These  stepsizes  hide  the  internal  model  the
	  POPULATION has made of its ENVIRONMENT, i.e.  a  SELF-ADAPTATION
	  of the stepsizes has replaced the exogenous control of the (1+1)
	  ES.

	  This concept works because SELECTION  sooner  or  later  prefers
	  those  INDIVIDUALs  having  built  a good model of the objective
	  function, thus  producing  better  OFFSPRING.   Hence,  learning
	  takes place on two levels: (1) at the genotypic, i.e. the object
	  and STRATEGY VARIABLE level and (2)  at  the  phenotypic  level,
	  i.e. the FITNESS level.

	  Depending  on  an  individual's  x_i,  the  resulting  objective
	  function value f(x), where x denotes  the  vector  of  objective
	  variables,  serves  as  the PHENOTYPE (FITNESS) in the SELECTION
	  step. In a PLUS STRATEGY, the m best of  all  (m+l)  INDIVIDUALs
	  survive to become the PARENTs of the next GENERATION.  Using the
	  comma variant, selection takes place only among the l OFFSPRING.
	  The   second   scheme  is  more  realistic  and  therefore  more
	  successful, because no individual  may  survive  forever,  which
	  could  at  least  theoretically  occur  using  the plus variant.
	  Untypical for conventional OPTIMIZATION algorithms and lavish at
	  first    sight,   a   COMMA   STRATEGY   allowing   intermediate
	  deterioration performs better! Only  by  forgetting  highly  fit
	  individuals  can  a  permanent  adaptation of the STEPSIZEs take
	  place and avoid long stagnation phases due to misadapted  s_i's.
	  This  means  that these individuals have built an internal model
	  that is no longer appropriate for  further  progress,  and  thus
	  should better be discarded.

	  By   choosing  a  certain  ratio  m/l,  one  can  determine  the
	  convergence property of the EVOLUTION STRATEGY: If one  wants  a
	  fast,  but  local  convergence,  one  should choose a small HARD
	  SELECTION, ratio, e.g.  (5,100),  but  looking  for  the  global
	  optimum, one should favour a softer SELECTION (15,100).

	  SELF-ADAPTATION  within  ESs  depends  on  the  following agents
	  (Schwefel 87):

     Randomness:
	  One cannot model MUTATION as a purely random process. This would
	  mean that a child is completely independent of its PARENTs.

     POPULATION
	  size:  The POPULATION has to be sufficiently large. Not only the
	  current best should be allowed to reproduce, but a set  of  good
	  INDIVIDUALs.    Biologists   have  coined  the  term  "requisite
	  variety" to mean the genetic  variety  necessary  to  prevent  a
	  SPECIES   from   becoming  poorer  and  poorer  genetically  and
	  eventually dying out.

     COOPERATION:
	  In order to exploit the effects of a POPULATION  (m  >  1),  the
	  INDIVIDUALs should recombine their knowledge with that of others
	  (cooperate)  because  one  cannot  expect   the   knowledge   to
	  accumulate in the best individual only.

     Deterioration:
	  In  order to allow better internal models (STEPSIZEs) to provide
	  better progress in the future, one should  accept  deterioration
	  from  one  GENERATION to the next. A limited life-span in nature
	  is not a sign of failure, but an important means of preventing a
	  SPECIES from freezing genetically.

	  ESs  prove  to  be  successful  when compared to other iterative
	  methods on a large number of test problems (Schwefel 77).   They
	  are  adaptable  to nearly all sorts of problems in OPTIMIZATION,
	  because they need very little  information  about  the  problem,
	  esp.  no  derivatives  of  the objective function. For a list of
	  some 300 applications of EAs, see the SyS-2/92 report (cf  Q14).
	  ESs   are  capable  of  solving  high  dimensional,  multimodal,
	  nonlinear  problems   subject   to   linear   and/or   nonlinear
	  constraints.   The  objective  function  can  also,  e.g. be the
	  result of a SIMULATION, it does not have to be given in a closed
	  form.   This  also holds for the constraints which may represent
	  the outcome of, e.g. a finite elements method  (FEM).  ESs  have
	  been  adapted  to VECTOR OPTIMIZATION problems (Kursawe 92), and
	  they can also serve as a heuristic for NP-complete combinatorial
	  problems like the TRAVELLING SALESMAN PROBLEM or problems with a
	  noisy or changing response surface.

	  References

	  Kursawe,  F.  (1992)   "   Evolution   strategies   for   vector
	  optimization",  Taipei, National Chiao Tung University, 187-193.

	  Kursawe, F. (1994) "  Evolution  strategies:  Simple  models  of
	  natural  processes?", Revue Internationale de Systemique, France
	  (to appear).

	  Rechenberg,   I.   (1973)   "Evolutionsstrategie:    Optimierung
	  technischer Systeme nach Prinzipien der biologischen Evolution",
	  Stuttgart: Fromman-Holzboog.

	  Schwefel,   H.-P.    (1977)    "Numerische    Optimierung    von
	  Computermodellen   mittels   der   Evolutionsstrategie",  Basel:
	  Birkhaeuser.

	  Schwefel, H.-P. (1987) "Collective  Phaenomena  in  Evolutionary
	  Systems",  31st  Annu.  Meet.  Inter'l  Soc.  for General System
	  Research, Budapest, 1025-1033.

------------------------------

Subject: Q1.4: What's a Classifier System (CFS)?

 The name of the Game
     First, a word on naming conventions is due, for no other paradigm  of
     EC  has  undergone  more  changes  to  it's name space than this one.
     Initially, Holland called his cognitive models  "Classifier  Systems"
     abbrv. with CS, and sometimes CFS, as can be found in [GOLD89].

     Whence Riolo came into play in 1986 and Holland added a reinforcement
     component to the overall design of a CFS, that emphasized its ability
     to learn. So, the word "learning" was prepended to the name, to make:
     "Learning Classifier Systems" (abbrv. to LCS).  On October 6-9,  1992
     the  "1st Inter'l Workshop on Learning Classifier Systems" took place
     at the NASA Johnson Space Center, Houston, TX.   A  summary  of  this
     "summit"  of  all  leading  researchers in LCS can be found on ENCORE
     (See Q15.3) in file CFS/papers/lcs92.ps.gz

     Today, the story continues, LCSs are sometimes subsumed under a "new"
     machine   learning   paradigm   called   "Evolutionary  Reinforcement
     Learning" or ERL for short, incorporating LCSs, "Q-Learning", devised
     by Watkins (1989), and a paradigm of the same name, devised by Ackley
     and Littman [ALIFEIII].

 On Schema Processors and ANIMATS
     So, to get back to the question above, "What  are  CFSs?",  we  first
     might  answer,  "Well,  there  are  many interpretations of Holland's
     ideas...what do you like to know in particular?"  And then we'd start
     with  a  recitation  from  [HOLLAND75,92], and explain all the SCHEMA
     processors, the broadcast language, etc.  But, we will  take  a  more
     comprehensive,  and  intuitive  way  to  understand  what  CLASSIFIER
     SYSTEMs are all about.

     The easiest road to explore the very nature of CLASSIFIER SYSTEMs, is
     to take the animat (ANIMAl + ROBOT = ANIMAT) "lane" devised by Booker
     (1982) and later studied  extensively  by  Wilson  (1985),  who  also
     coined  the  term for this approach. Work continues on animats but is
     often  regarded  as  ARTIFICIAL   LIFE   rather   than   EVOLUTIONARY
     COMPUTATION.   This  thread  of  research has even its own conference
     series: "Simulation of Adaptive Behavior (SAB)" (cf  Q12),  including
     other   notions   from   machine  learning,  connectionist  learning,
     evolutionary robotics, etc.  [NB: the latter is obvious, if an animat
     lives  in  a  digital microcosm, it can be put into the real world by
     implantation   into   an   autonomous   robot   vehicle,   that   has
     sensors/detectors   (camera   eyes,  whiskers,  etc.)  and  effectors
     (wheels, robot arms, etc.); so  all  that's  needed  is  to  use  our
     algorithm  as  the  "brain"  of this vehicle, connecting the hardware
     parts with the software learning component.  For a fascinating  intro
     to the field see, e.g. Braitenberg (1984)]

     CLASSIFIER  SYSTEMs,  however,  are  yet  another  offspring  of John
     Holland's aforementioned book, and can be seen as one  of  the  early
     applications  of  GAs,  for  CFSs  use this evolutionary algorithm to
     adapt their behavior toward a changing ENVIRONMENT, as  is  explained
     below in greater detail.

     Holland  envisioned  a  cognitive  system  capable of classifying the
     goings on in its ENVIRONMENT, and then reacting to  these  goings  on
     appropriately.  So  what is needed to build such a system? Obviously,
     we need (1) an environment; (2) receptors that tell our system  about
     the  goings  on;  (3)  effectors,  that let our system manipulate its
     environment; and (4) the system itself, conveniently a "black box" in
     this first approach, that has (2) and (3) attached to it, and "lives"
     in (1).

     In the animat  approach,  (1)  usually  is  an  artificially  created
     digital  world,  e.g.  in Booker's Gofer system, a 2 dimensional grid
     that contains "food" and "poison".  And the Gofer itself, that  walks
     across  this grid and tries (a) to learn to distinguish between these
     two items, and (b) survive well fed.

     Much the same for Wilson's animat, called  "*".  Yes,  it's  just  an
     asterisk,  and a "Kafka-esque naming policy" is one of the sign posts
     of the whole field; e.g. the  first  implementation  by  Holland  and
     Reitmann  1978  was  called CS-1, (cognitive system 1); Smith's Poker
     player LS-1 (1980)  followed  this  "convention".  Riolo's  1988  LCS
     implementations  on  top  of  his CFS-C library (cf Q20), were dubbed
     FSW-1 (Finite State World 1), and LETSEQ-1 (LETter SEQuence predictor
     1).

     So  from  the  latter  paragraph we can conclude that ENVIRONMENT can
     also mean something completely different (e.g. an infinite stream  of
     letters,  time  serieses,  etc.)  than  in  the  animat approach, but
     anyway; we'll stick to it, and go on.

     Imagine a very simple animat, e.g. a  simplified  model  of  a  frog.
     Now,  we  know  that  frogs  live  in (a) Muppet Shows, or (b) little
     ponds; so we chose the latter as our demo ENVIRONMENT  (1);  and  the
     former  for  a  non-Kafka-esque  name  of  our model, so let's dub it
     "Kermit".

     Kermit has eyes, i.e. sensorial input detectors (2); hands and  legs,
     i.e.    environment-manipulating   effectors  (3);  is  a  spicy-fly-
     detecting-and-eating device, i.e. a frog (4); so we  got  all  the  4
     pieces needed.

 Inside the Black Box
     The most primitive "black box" we can think of is a computer.  It has
     inputs (2), and outputs (3), and a message passing system  inbetween,
     that  converts  (i.e.,  computes), certain input messages into output
     messages, according to a set of rules, usually called  the  "program"
     of that computer.  From the theory of computer science, we now borrow
     the simplest of all program  structures,  that  is  something  called
     "production  system"  or  PS  for  short.   A PS has been shown to be
     computationally complete by Post (1943), that's why it  is  sometimes
     called  a  "Post  System",  and  later by Minsky (1967).  Although it
     merely consists of a set of if-then rules, it still resembles a full-
     fledged computer.

     We  now  term  a  single  "if-then" rule a "classifier", and choose a
     representation that makes it easy to manipulate these, for example by
     encoding  them  into  binary  strings.   We  then  term  the  set  of
     classifiers, a "classifier population", and immediately know  how  to
     breed  new  rules  for  our  system:  just  use  a GA to generate new
     rules/classifiers from the current POPULATION!

     All that's left are the messages  floating  through  the  black  box.
     They  should also be simple strings of zeroes and ones, and are to be
     kept in a data structure, we call "the message list".

     With all this given, we can imagine the goings on  inside  the  black
     box as follows: The input interface (2) generates messages, i.e., 0/1
     strings, that are written on the message list.  Then  these  messages
     are  matched  against  the condition-part of all classifiers, to find
     out which actions are to be triggered.   The  message  list  is  then
     emptied,  and  the  encoded  actions,  themselves  just messages, are
     posted to the message list.  Then, the output  interface  (3)  checks
     the message list for messages concerning the effectors. And the cycle
     restarts.

     Note, that it is possible in this set-up to have "internal messages",
     because  the message list is not emptied after (3) has checked; thus,
     the input interface messages are added to the initially  empty  list.
     (cf Algorithm CFS, LCS below)

     The  general  idea  of  the  CFS is to start from scratch, i.e., from
     tabula rasa  (without  any  knowledge)  using  a  randomly  generated
     classifier  POPULATION,  and  let  the  system  learn  its program by
     induction, (cf Holland et al. 1986), this reduces the input stream to
     recurrent  input patterns, that must be repeated over and over again,
     to enable the animat to classify its  current  situation/context  and
     react on the goings on appropriately.

 What does it need to be a frog?
     Let's  take a look at the behavior emitted by Kermit. It lives in its
     digital microwilderness where it moves around  randomly.   [NB:  This
     seemingly  "random"  behavior  is not that random at all; for more on
     instinctive, i.e., innate behavior  of  non-artificial  animals  see,
     e.g.  Tinbergen (1951)]

     Whenever  a  small flying object appears, that has no stripes, Kermit
     should eat it, because it's very likely a spicy fly, or other  flying
     insect.   If it has stripes, the insect is better left alone, because
     Kermit had better not bother with wasps, hornets, or bees.  If Kermit
     encounters a large, looming object, it immediately uses its effectors
     to jump away, as far as possible.

     So, part of these behavior patterns within the "pond  world",  in  AI
     sometimes called a "frame," from traditional knowledge representation
     techniques, Rich (1983), can be expressed in a set of "if <condition>
     then <action>" rules, as follows:

	  IF small, flying object to the left THEN send @
	  IF small, flying object to the right THEN send %
	  IF small, flying object centered THEN send $
	  IF large, looming object THEN send !
	  IF no large, looming object THEN send *
	  IF * and @ THEN move head 15 degrees left
	  IF * and % THEN move head 15 degrees right
	  IF * and $ THEN move in direction head pointing
	  IF ! THEN move rapidly away from direction head pointing

     Now,  this set of rules has to be encoded for use within a CLASSIFIER
     SYSTEM.  A possible encoding of the above rule set in  CFS-C  (Riolo)
     classifier   terminology.   The   condition   part  consists  of  two
     conditions, that are combined with a logical AND, thus  must  be  met
     both  to  trigger  the  associated action. This structure needs a NOT
     operation, (so we get NAND, and know from hardware  design,  that  we
     can  build  any computer solely with NANDs), in CFS-C this is denoted
     by the `~' prefix character (rule #5).

	  IF             THEN
	   0000,  00 00  00 00
	   0000,  00 01  00 01
	   0000,  00 10  00 10
	   1111,  01 ##  11 11
	  ~1111,  01 ##  10 00
	   1000,  00 00  01 00
	   1000,  00 01  01 01
	   1000,  00 10  01 10
	   1111,  ## ##  01 11

     Obviously, string `0000' denotes small, and `00' in the fist part  of
     the  second  column,  denotes flying.  The last two bits of column #2
     encode the direction of the  object  approaching,  where  `00'  means
     left, `01' means right, etc.

     In  rule  #4  a the "don't care symbol" `#' is used, that matches `1'
     and `0',  i.e.,  the  position  of  the  large,  looming  object,  is
     completely   arbitrary.   A  simple  fact,  that  can  save  Kermit's
     (artificial) life.

 PSEUDO CODE (Non-Learning CFS)
     Algorithm CFS is

	  // start with an initial time
	  t := 0;

	  // an initially empty message list
	  initMessageList ML (t);

	  // and a randomly generated population of classifiers
	  initClassifierPopulation P (t);

	  // test for cycle termination criterion (time, fitness, etc.)
	  while not done do

	       // increase the time counter
	       t := t + 1;

	       // 1. detectors check whether input messages are present
	       ML := readDetectors (t);

	       // 2. compare ML to the classifiers and save matches
	       ML' := matchClassifiers ML,P (t);

	       // 3. process new messages through output interface
	       ML := sendEffectors ML' (t);
	  od
     end CFS.

     To convert the previous, non-learning CFS into a learning  CLASSIFIER
     SYSTEM,  LCS,  as  has  been proposed in Holland (1986), it takes two
     steps:

     (1)   the major cycle has to be changed such that the  activation  of
	   each  classifier depends on some additional parameter, that can
	   be modified as a result of experience, i.e. reinforcement  from
	   the ENVIRONMENT;
     (2)   and/or  change  the  contents  of  the  classifier  list, i.e.,
	   generate  new  classifiers/rules,  by  removing,   adding,   or
	   combining condition/action-parts of existing classifiers.

	   The algorithm thus changes accordingly:

 PSEUDO CODE (Learning CFS)
     Algorithm LCS is

	  // start with an initial time
	  t := 0;

	  // an initially empty message list
	  initMessageList ML (t);

	  // and a randomly generated population of classifiers
	  initClassifierPopulation P (t);

	  // test for cycle termination criterion (time, fitness, etc.)
	  while not done do

	       // increase the time counter
	       t := t + 1;

	       // 1. detectors check whether input messages are present
	       ML := readDetectors (t);

	       // 2. compare ML to the classifiers and save matches
	       ML' := matchClassifiers ML,P (t);

	       // 3. highest bidding classifier(s) collected in ML' wins the
	       // "race" and post the(ir) message(s)
	       ML' := selectMatchingClassifiers ML',P (t);

	       // 4. tax bidding classifiers, reduce their strength
	       ML' := taxPostingClassifiers ML',P (t);

	       // 5. effectors check new message list for output msgs
	       ML := sendEffectors ML' (t);

	       // 6. receive payoff from environment (REINFORCEMENT)
	       C := receivePayoff (t);

	       // 7. distribute payoff/credit to classifiers (e.g. BBA)
	       P' := distributeCredit C,P (t);

	       // 8. Eventually (depending on t), an EA (usually a GA) is
	       // applied to the classifier population
	       if criterion then
		    P := generateNewRules P' (t);
	       else
		    P := P'
	  od
     end LCS.

 What's the problem with CFSs?
     Just  to list the currently known problems that come with CFSs, would
     take some additional pages; therefore only  some  interesting  papers
     dealing  with  unresolved riddles are listed; probably the best paper
     containing most of these is the aforementioned  summary  of  the  LCS
     Workshop:

     Smith,  R.E.  (1992) "A report on the first Inter'l Workshop on LCSs"
     avail. from ENCORE (See Q15.3) in file CFS/papers/lcs92.ps.gz

     Other noteworthy critiques on LCSs include:
     Wilson, S.W. (1987)  "Classifier  Systems  and  the  Animat  Problem"
     Machine Learning, 2.

     Wilson,  S.W.  (1988)  "Bid Competition and Specificity Reconsidered"
     Complex Systems, 2(5):705-723.

     Wilson, S.W. & Goldberg, D.E. (1989) "A critical review of classifier
     systems" [ICGA89], 244-255.

     Goldberg,  D.E., Horn, J. & Deb, K. (1992) "What makes a problem hard
     for a classifier system?"  (containing the Goldberg  citation  below)
     is    also    available    from    ENCORE   (See   Q15.3)   in   file
     CFS/papers/lcs92-2.ps.gz

     Dorigo, M. (1993) "Genetic  and  Non-genetic  Operators  in  ALECSYS"
     Evolutionary  Computation,  1(2):151-164.   The technical report, the
     journal article is based on is avail. from ENCORE (See Q15.3) in file
     CFS/papers/icsi92.ps.gz

     Smith,  R.E.  Forrest,  S.  &  Perelson,  A.S.  (1993) "Searching for
     Diverse,   Cooperative   POPULATIONs   with    Genetic    Algorithms"
     Evolutionary Computation, 1(2):127-149.

 Conclusions?
     Generally speaking:
      "There's much to do in CFS research!"

     No  other  notion of EC provides more space to explore and if you are
     interested in a PhD in the field, you might want  to  take  a  closer
     look  at  CFS.   However,  be warned!, to quote Goldberg: "classifier
     systems  are  a  quagmire---a  glorious,  wondrous,   and   inventing
     quagmire, but a quagmire nonetheless."

     References

     Booker, L.B. (1982) "Intelligent behavior as an adaption to the  task
     environment" PhD Dissertation, Univ. of Michigan, Logic of  Computers
     Group, Ann Arbor, MI.

     Braitenberg,   V.   (1984)   "Vehicles:   Experiments   in  Synthetic
     Psychology" Boston, MA: MIT Press.

     Holland, J.H. (1986)  "Escaping  Brittleness:  The  possibilities  of
     general-purpose  learning  algorithms  applied to parallel rule-based
     systems". In: R.S. Michalski, J.G. Carbonell & T.M.  Mitchell  (eds),
     Machine  Learning:  An  Artificial  Intelligence  approach,  Vol  II,
     593-623, Los Altos, CA: Morgan Kaufman.

     Holland, J.H., et al.  (1986)  "Induction:  Processes  of  Inference,
     Learning, and Discovery", Cambridge, MA: MIT Press.

     Holland,  J.H.  (1992) "Adaptation in natural and artificial systems"
     Boston, MA: MIT Press.

     Holland, J.H. & Reitman, J.S.  (1978)  "Cognitive  Systems  based  on
     Adaptive  Algorithms" In D.A. Waterman & F.Hayes-Roth, (eds) Pattern-
     directed inference systems. NY: Academic Press.

     Minsky,  M.L.   (1961)   "Steps   toward   Artificial   Intelligence"
     Proceedings  IRE, 49, 8-30. Reprinted in E.A. Feigenbaum & J. Feldman
     (eds) Computers and Thought, 406-450, NY: McGraw-Hill, 1963.

     Minsky, M.L.  (1967)  "Computation:  Finite  and  Infinite  Machines"
     Englewood Cliffs, NJ: Prentice-Hall.

     Post,  Emil L. (1943) "Formal reductions of the general combinatorial
     decision problem" American Journal of Mathematics, 65, 197-215.

     Rich, E. (1983) "Artificial Intelligence" NY: McGraw-Hill.

     Tinbergen, N. (1951) "The Study of Instinct" NY: Oxford Univ.  Press.

     Watkins,  C. (1989) "Learning from Delayed Rewards" PhD Dissertation,
     Department of Psychology, Cambridge Univ., UK.

     Wilson, S.W. (1985) "Knowledge growth in  an  artificial  animal"  in
     [ICGA85], 16-23.

------------------------------

Subject: Q1.5: What's Genetic Programming (GP)?

     GENETIC PROGRAMMING is the extension of the genetic model of learning
     into the space of programs. That is, the objects that constitute  the
     POPULATION   are  not  fixed-length  character  strings  that  encode
     possible solutions to the problem at hand, they  are  programs  that,
     when  executed,  "are"  the candidate solutions to the problem. These
     programs are expressed in genetic programming as parse trees,  rather
     than  as lines of code.  Thus, for example, the simple program "a + b
     * c" would be represented as:

		 +
		/ \
		a  *
		 / \
		 b  c

     or, to be precise, as suitable data  structures  linked  together  to
     achieve this effect. Because this is a very simple thing to do in the
     programming language Lisp, many GPers tend to use Lisp. However, this
     is simply an implementation detail. There are straightforward methods
     to implement GP using a non-Lisp programming environment.

     The programs in the POPULATION are  composed  of  elements  from  the
     FUNCTION  SET and the TERMINAL SET, which are typically fixed sets of
     symbols selected to be appropriate to the solution of problems in the
     domain of interest.

     In  GP  the  CROSSOVER  operation  is  implemented by taking randomly
     selected subtrees in the INDIVIDUALs (selected according to  FITNESS)
     and exchanging them.

     It should be pointed out that GP usually does not use any MUTATION as
     a GENETIC OPERATOR.

     More information is available in the GP mailing  list  FAQ.   (See  Q
     15.2)

------------------------------

End of ai-faq/genetic/part2
***************************
