Q1.3: What's an Evolution Strategy (ES)?

     In  1963 two students at the Technical University of Berlin (TUB) met
     and were soon to collaborate  on  experiments  which  used  the  wind
     tunnel  of  the Institute of Flow Engineering.  During the search for
     the optimal shapes of bodies in a flow, which was then  a  matter  of
     laborious  intuitive  experimentation,  the  idea  was  conceived  of
     proceeding strategically.  However, attempts with the coordinate  and
     simple  gradient  strategies  (cf Q5) were unsuccessful.  Then one of
     the  students,  Ingo  Rechenberg,  now  Professor  of   Bionics   and
     Evolutionary  Engineering, hit upon the idea of trying random changes
     in the parameters  defining  the  shape,  following  the  example  of
     natural  MUTATIONs.   The  EVOLUTION  STRATEGY  was  born.   A  third
     student, Peter Bienert, joined them and started the  construction  of
     an  automatic  experimenter, which would work according to the simple
     rules of mutation  and  SELECTION.   The  second  student,  Hans-Paul
     Schwefel,  set  about  testing the efficiency of the new methods with
     the help of a Zuse Z23 computer; for there were plenty of  objections
     to these "random strategies."

     In spite of an occasional lack of financial support, the Evolutionary
     Engineering Group which had been formed held  firmly  together.  Ingo
     Rechenberg  received  his  doctorate  in  1970  (Rechenberg  73).  It
     contains the theory of the two  membered  EVOLUTION  STRATEGY  and  a
     first proposal for a multimembered strategy which in the nomenclature
     introduced here is of the (m+1) type.   In the  same  year  financial
     support  from  the  Deutsche  Forschungsgemeinschaft  (DFG, Germany's
     National Science Foundation) enabled the work, that was concluded, at
     least  temporarily,  in 1974 with the thesis "Evolutionsstrategie und
     numerische Optimierung" (Schwefel 77).

     Thus,  EVOLUTION  STRATEGIEs  were  invented   to   solve   technical
     OPTIMIZATION  problems  (TOPs)  like  e.g.  constructing  an  optimal
     flashing nozzle, and until recently  ES  were  only  known  to  civil
     engineering  folks, as an alternative to standard solutions.  Usually
     no closed form analytical objective function is  available  for  TOPs
     and   hence,  no  applicable  optimization  method  exists,  but  the
     engineer's intuition.

     The first attempts to imitate principles of organic  EVOLUTION  on  a
     computer  still resembled the iterative OPTIMIZATION methods known up
     to that time (cf Q5):  In a two-membered  or  (1+1)  ES,  one  PARENT
     generates   one   OFFSPRING   per  GENERATION  by  applying  NORMALLY
     DISTRIBUTED MUTATIONs, i.e. smaller steps occur more likely than  big
     ones,  until  a child performs better than its ancestor and takes its
     place. Because of this  simple  structure,  theoretical  results  for
     STEPSIZE control and CONVERGENCE VELOCITY could be derived. The ratio
     between successful and all mutations should  come  to  1/5:  the  so-
     called  1/5  SUCCESS RULE was discovered. This first algorithm, using
     mutation only, has then been  enhanced  to  a  (m+1)  strategy  which
     incorporated  RECOMBINATION  due  to  several,  i.e.  m parents being
     available. The mutation scheme and  the  exogenous  stepsize  control
     were taken across unchanged from  (1+1) ESs.

     Schwefel  later  generalized these strategies to the multimembered ES
     now denoted by (m+l) and (m,l) which  imitates  the  following  basic
     principles  of  organic  EVOLUTION:  a  POPULATION,  leading  to  the
     possibility  of  RECOMBINATION  with  random  mating,  MUTATION   and
     SELECTION.   These  strategies  are  termed  PLUS  STRATEGY and COMMA
     STRATEGY, respectively: in the plus case, the parental GENERATION  is
     taken into account during selection, while in the comma case only the
     OFFSPRING undergoes selection, and the PARENTs die off. m (usually  a
     lowercase mu, denotes the population size, and l, usually a lowercase
     lambda denotes the number of offspring generated per generation).  Or
     to  put  it  in  an  utterly  insignificant  and  hopelessly outdated
     language:

	  (define (Evolution-strategy population)
	    (if (terminate? population)
	      population
	      (evolution-strategy
		(select
		  (cond (plus-strategy?
			  (union (mutate
				   (recombine population))
				 population))
			(comma-strategy?
			  (mutate
			    (recombine population))))))))

     However, dealing with ES is sometimes seen as "strong  tobacco,"  for
     it takes a decent amount of probability theory and applied STATISTICS
     to understand the inner workings of an ES, while it navigates through
     the  hyperspace  of  the  usually  n-dimensional  problem  space,  by
     throwing hyperelipses into the deep...

     Luckily, this medium doesn't allow for  much  mathematical  ballyhoo;
     the  author  therefore  has  to come up with a simple but brilliantly
     intriguing explanation to save the reader from falling asleep  during
     the rest of this section, so here we go:

     Imagine a black box. A large black box. As large as, say for example,
     a Coca-Cola vending machine. Now, [..] (to be continued)

     A single INDIVIDUAL of the ES' POPULATION consists of  the  following
     GENOTYPE representing a point in the SEARCH SPACE:

     OBJECT VARIABLES
	  Real-valued  x_i  have to be tuned by RECOMBINATION and MUTATION
	  such that an objective  function  reaches  its  global  optimum.
	  Referring   to   the  metaphor  mentioned  previously,  the  x_i
	  represent the regulators of the alien Coka-Cola vending machine.

     STRATEGY VARIABLEs
	  Real-valued  s_i  (usually denoted by a lowercase sigma) or mean
	  STEPSIZEs determine the mutability of the  x_i.  They  represent
	  the STANDARD DEVIATION of a  (0, s_i) GAUSSIAN DISTRIBUTION (GD)
	  being added to each x_i as  an  undirected  MUTATION.   With  an
	  "expectancy  value"  of  0  the  PARENTs will produce OFFSPRINGs
	  similar to themselves on average. In order to  make  a  doubling
	  and  a  halving  of  a stepsize equally probable, the s_i mutate
	  log-normally, distributed, i.e.   exp(GD),  from  GENERATION  to
	  generation.    These  stepsizes  hide  the  internal  model  the
	  POPULATION has made of its ENVIRONMENT, i.e.  a  SELF-ADAPTATION
	  of the stepsizes has replaced the exogenous control of the (1+1)
	  ES.

	  This concept works because SELECTION  sooner  or  later  prefers
	  those  INDIVIDUALs  having  built  a good model of the objective
	  function, thus  producing  better  OFFSPRING.   Hence,  learning
	  takes place on two levels: (1) at the genotypic, i.e. the object
	  and STRATEGY VARIABLE level and (2)  at  the  phenotypic  level,
	  i.e. the FITNESS level.

	  Depending  on  an  individual's  x_i,  the  resulting  objective
	  function value f(x), where x denotes  the  vector  of  objective
	  variables,  serves  as  the PHENOTYPE (FITNESS) in the SELECTION
	  step. In a PLUS STRATEGY, the m best of  all  (m+l)  INDIVIDUALs
	  survive to become the PARENTs of the next GENERATION.  Using the
	  comma variant, selection takes place only among the l OFFSPRING.
	  The   second   scheme  is  more  realistic  and  therefore  more
	  successful, because no individual  may  survive  forever,  which
	  could  at  least  theoretically  occur  using  the plus variant.
	  Untypical for conventional OPTIMIZATION algorithms and lavish at
	  first    sight,   a   COMMA   STRATEGY   allowing   intermediate
	  deterioration performs better! Only  by  forgetting  highly  fit
	  individuals  can  a  permanent  adaptation of the STEPSIZEs take
	  place and avoid long stagnation phases due to misadapted  s_i's.
	  This  means  that these individuals have built an internal model
	  that is no longer appropriate for  further  progress,  and  thus
	  should better be discarded.

	  By   choosing  a  certain  ratio  m/l,  one  can  determine  the
	  convergence property of the EVOLUTION STRATEGY: If one  wants  a
	  fast,  but  local  convergence,  one  should choose a small HARD
	  SELECTION, ratio, e.g.  (5,100),  but  looking  for  the  global
	  optimum, one should favour a softer SELECTION (15,100).

	  SELF-ADAPTATION  within  ESs  depends  on  the  following agents
	  (Schwefel 87):

     Randomness:
	  One cannot model MUTATION as a purely random process. This would
	  mean that a child is completely independent of its PARENTs.

     POPULATION
	  size:  The POPULATION has to be sufficiently large. Not only the
	  current best should be allowed to reproduce, but a set  of  good
	  INDIVIDUALs.    Biologists   have  coined  the  term  "requisite
	  variety" to mean the genetic  variety  necessary  to  prevent  a
	  SPECIES   from   becoming  poorer  and  poorer  genetically  and
	  eventually dying out.

     COOPERATION:
	  In order to exploit the effects of a POPULATION  (m  >  1),  the
	  INDIVIDUALs should recombine their knowledge with that of others
	  (cooperate)  because  one  cannot  expect   the   knowledge   to
	  accumulate in the best individual only.

     Deterioration:
	  In  order to allow better internal models (STEPSIZEs) to provide
	  better progress in the future, one should  accept  deterioration
	  from  one  GENERATION to the next. A limited life-span in nature
	  is not a sign of failure, but an important means of preventing a
	  SPECIES from freezing genetically.

	  ESs  prove  to  be  successful  when compared to other iterative
	  methods on a large number of test problems (Schwefel 77).   They
	  are  adaptable  to nearly all sorts of problems in OPTIMIZATION,
	  because they need very little  information  about  the  problem,
	  esp.  no  derivatives  of  the objective function. For a list of
	  some 300 applications of EAs, see the SyS-2/92 report (cf  Q14).
	  ESs   are  capable  of  solving  high  dimensional,  multimodal,
	  nonlinear  problems   subject   to   linear   and/or   nonlinear
	  constraints.   The  objective  function  can  also,  e.g. be the
	  result of a SIMULATION, it does not have to be given in a closed
	  form.   This  also holds for the constraints which may represent
	  the outcome of, e.g. a finite elements method  (FEM).  ESs  have
	  been  adapted  to VECTOR OPTIMIZATION problems (Kursawe 92), and
	  they can also serve as a heuristic for NP-complete combinatorial
	  problems like the TRAVELLING SALESMAN PROBLEM or problems with a
	  noisy or changing response surface.

	  References

	  Kursawe,  F.  (1992)   "   Evolution   strategies   for   vector
	  optimization",  Taipei, National Chiao Tung University, 187-193.

	  Kursawe, F. (1994) "  Evolution  strategies:  Simple  models  of
	  natural  processes?", Revue Internationale de Systemique, France
	  (to appear).

	  Rechenberg,   I.   (1973)   "Evolutionsstrategie:    Optimierung
	  technischer Systeme nach Prinzipien der biologischen Evolution",
	  Stuttgart: Fromman-Holzboog.

	  Schwefel,   H.-P.    (1977)    "Numerische    Optimierung    von
	  Computermodellen   mittels   der   Evolutionsstrategie",  Basel:
	  Birkhaeuser.

	  Schwefel, H.-P. (1987) "Collective  Phaenomena  in  Evolutionary
	  Systems",  31st  Annu.  Meet.  Inter'l  Soc.  for General System
	  Research, Budapest, 1025-1033.