Newsgroups: comp.ai.genetic,comp.answers,news.answers
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!cam-news-feed3.bbnplanet.com!news.bbnplanet.com!cam-news-hub1.bbnplanet.com!howland.erols.net!news-peer.gsl.net!news.gsl.net!news-lond.gsl.net!news.gsl.net!netcom.net.uk!nuntius.u-net.net!yama.mcc.ac.uk!loki.cf.ac.uk!scmdb
From: David.Beasley@cs.cf.ac.uk (David Beasley)
Subject: FAQ: comp.ai.genetic part 3/6 (A Guide to Frequently Asked Questions)
Summary: This is part 3 of a <trilogy> entitled "The Hitch-Hiker's Guide
     to Evolutionary Computation". A periodically published list of Frequently
     Asked Questions (and their answers) about Evolutionary Algorithms,
     Life and Everything. It should be read by anyone who whishes to post
     to the comp.ai.genetic newsgroup, preferably *before* posting.
Originator: scmdb@thrall.cs.cf.ac.uk
Sender: news@cf.ac.uk (Usenet News user)
Message-ID: <part3_850729073@cs.cf.ac.uk>
Supersedes: <part3_843226984@cs.cf.ac.uk>
Approved: news-answers-request@MIT.Edu
Date: Mon, 16 Dec 1996 09:38:04 GMT
Expires: Thu, 20 Mar 1997 09:37:53 GMT
X-Nntp-Posting-Host: thrall.cs.cf.ac.uk
References: <part2_850729073@cs.cf.ac.uk>
Organization: Posted through the Joint Cardiff Computing Service, Wales, UK
Followup-To: comp.ai.genetic
Lines: 943
Xref: glinda.oz.cs.cmu.edu comp.ai.genetic:10631 comp.answers:23102 news.answers:1783

Archive-name:   ai-faq/genetic/part3
Last-Modified:  12/20/96
Issue:          4.4

TABLE OF CONTENTS OF PART 3
     Q2: What applications of EAs are there?

     Q3: Who is concerned with EAs?

     Q4: How many EAs exist? Which?
     Q4.1: What about Alife systems, like Tierra and VENUS?

     Q5: What about all this Optimization stuff?

----------------------------------------------------------------------

Subject: Q2: What applications of EAs are there?

     In   principle,   EAs  can  compute  any  computable  function,  i.e.
     everything a normal digital computer can do.

     But EAs are especially badly suited for problems where efficient ways
     of  solving  them  are  already  known,  (unless  these  problems are
     intended to serve as benchmarks).  Special purpose  algorithms,  i.e.
     algorithms  that  have  a  certain amount of problem domain knowledge
     hard coded into them, will usually outperform EAs,  so  there  is  no
     black  magic  in EC.  EAs should be used when there is no other known
     problem solving strategy, and  the  problem  domain  is  NP-complete.
     That's  where  EAs  come  into  play: heuristically finding solutions
     where all else fails.

     Following  is  an  incomplete   (sic!)    list   of   successful   EA
     applications:

 BIOCOMPUTING
     Biocomputing, or Bioinformatics, is the field of biology dedicated to
     the automatic analysis of experimental data (mostly sequencing data).
     Several  approaches  to  specific  biocomputing  problems  have  been
     described that involve the use of GA,  GP  and  simulated  annealing.
     General  information  about biocomputing (software, databases, misc.)
     can be found on the server of the European Bioinformatics  Institute:
     http://www.ebi.ac.uk/ebi_home.html  ENCORE  has  a  good selection of
     pointers related to this subject.  VSCN provides  a  detailed  online
     course        on        bioinformatics:       http://www.techfak.uni-
     bielefeld.de/bcd/Curric/welcome.html

     There are three main  domains  to  which  GA  have  been  applied  in
     Bioinformatics: protein folding, RNA folding, sequence alignment.

     Protein Folding

     Proteins  are  one  of  the essential components of any form of life.
     They are made of twenty different types of amino acid.   These  amino
     acids  are  chained  together  in  order to form the protein that can
     contain from a few to several thousands  residues.  In  most  of  the
     cases,  the  properties and the function of a protein are a result of
     its three dimensional structure.  It seems that in  many  cases  this
     structure  is a direct consequence of the sequence. Unfortunately, it
     is still very difficult/impossible to deduce  the  three  dimensional
     structure,  knowing  only  the  sequence.  A part of the VSCN on-line
     bioinformatics course is dedicated to  the  use  of  GAs  in  Protein
     Folding  Prediction.  It  contains  an  extensive  bibliography and a
     detailed presentation of the subject with LOTS  of  explanations  and
     on-line     papers.     The     URL    is:    http://www.techfak.uni-
     bielefeld.de/bcd/Curric/ProtEn/contents.html

     Koza [KOZA92] gives one example of GP  applied  to  Protein  Folding.
     Davis  [DAVIS91]  gives  an example of DNA conformation prediction (a
     closely related problem) in his Handbook of GAs.

     RNA Folding

     Describing the secondary structure of an RNA molecule,  is  about  as
     hard  as  for  a  protein,  but describing the intermediate structure
     (secondary structure) is somehow easier  because  RNA  molecules  are
     using the same pairing rules as DNA, (Watson and Crick base pairing).
     There exist deterministic algorithms that given a set of  constraints
     (rules),  compute  the more stable structure, but: (a) their time and
     memory requirement increase quadratically or more with the length  of
     the sequences, and (b) they require simplified rules.  Lots of effort
     has recently been put into applying GAs to this problem, and  several
     papers  can  be  found (on-line if your institute subscribes to these
     journals):

     A genetic Algorithm Based Molecular Modelling Technique For RNA Stem-
     loop  Structures       H.  Ogata,  Y. Akiyama and M Kanehisa, Nucleic
     Acid Research,      1995, vol 23,3 419-426

     An Annealing Mutation Operator in the GA  for  RNA  folding       B.A
     Shapiro and J. C. Wu, CABIOS, 1996, vol 12, 3, 171-180

     The  computer  Simulation  of  RNA  Folding  Pathway  Using a Genetic
     Algorithm      A.P. Gultyaev, F.D.H van Batenburg and C. W. A.  Pleij
     in Journal      of Molecular Biology, 1995, vol 250 37-51

     Simulated  Annealing  has  also  been  applied  successfully  to this
     problem:

     Description of RNA folding by SA      M. Schmitz  and  G.  Steger  in
     Journal of Molecular Biology, 1995, 255, 245-266

     Sequence Alignments

     Sequence  Alignment  is  another important problem of Bioinformatics.
     The aim is to align together several related sequences (from  two  to
     hundreds)  given  a  cost  function.   For the most widely  used cost
     functions, the problem has been shown  to  be  NP-complete.   Several
     attempts have been made using SA:

     Multiple  Sequence Alignment Using SA      J. Kim, Sakti Pramanik and
     M.J. Chung, CABIOS, 1994, vol 10, 4, 419-426

     Multiple Sequence Alignment by Parallel SA      M. Isshikawa, T. Koya
     and al, CABIOS, 1993,vol 9, 3, 267-273

     SAM,  software  which uses Hidden Markov Models for Multiple Sequence
     Alignment, can use SA to train the model. Several  papers  have  been
     published  on  SAM.   The  software,  documentation  and an extensive
     bibliography           can           be           found           in:
     http://www.cse.ucsc.edu/research/compbio/sam.html

     More  recently,  various  software using different methods like Gibbs
     sampling or GAs has been developed:

     A Gibbs Sampling Strategy for Multiple Alignment C.E. Lawrence, S. F.
     Altschull and al, Science, October 1993, vol 262, 208-214

     SAGA:  Sequence  Alignment by Genetic Algorithm      C. Notredame and
     D.G. Higgins, Nucleic Acid Research, 1995, vol 24, 8,
      1515-1524

     A  beta release of SAGA (along with the paper) is  available  on  the
     European    Bioinformatics    Institute    anonymous    FTP   server:
     ftp.ebi.ac.uk/pub/software/unix/saga.tar.Z

 GAME PLAYING
     GAs can be used to  evolve  behaviors  for  playing  games.  Work  in
     evolutionary  GAME  THEORY  typically  surrounds  the  EVOLUTION of a
     POPULATION of players who meet randomly to play a game in which  they
     each  must  adopt  one  of  a limited number of moves. (Maynard-Smith
     1982).  Let's suppose it is just two moves,  X  and  Y.  The  players
     receive  a reward, analogous to Darwinian FITNESS, depending on which
     combination of moves occurs and which  move  they  adopted.  In  more
     complicated models there may be several players and several moves.

     The  players  iterate such a game a series of times, and then move on
     to a new partner. At the end of all such moves, the players will have
     a cumulative payoff, their fitness.  This fitness can then be used as
     a means of conducting something akin to Roulette-Wheel  SELECTION  to
     generate a new population.

     The  real  key  in  using  a  GA  is  to  come up with an encoding to
     represent player's strategies, one that is amenable to CROSSOVER  and
     to MUTATION.  possibilities are to suppose at each iteration a player
     adopts X with some probability (and Y with one minus such). A  player
     can  thus  be  represented  as  a  real  number,  or  a bit-string by
     interpreting the decimal value of the bit string as  the  inverse  of
     the probability.

     An  alternative  characterisation  is  to model the players as Finite
     State Machines, or Finite Automata (FA). These can be though of as  a
     simple  flow chart governing behaviour in the "next" play of the game
     depending upon previous plays. For example:

	  100 Play X
	  110 If opponent plays X go to 100
	  120 Play Y
	  130 If opponent plays X go to 100 else go to 120
     Represents a strategy that does whatever its opponent did  last,  and
     begins  by  playing  X,  known as "Tit-For-Tat." (Axelrod 1982). Such
     machines can readily be encoded as bit-strings. Consider the encoding
     "1  0  1  0 0 1" to represent TFT.  The first three bits, "1 0 1" are
     state 0. The first bit, "1" is interpreted as "Play  X."  The  second
     bit,  "0"  is interpreted as "if opponent plays X go to state 1," the
     third bit, "1", is interpreted as "if the opponent  plays  Y,  go  to
     state  1."   State 1 has a similar interpretation. Crossing over such
     bit-strings always yields valid strategies.

     SIMULATIONs in the Prisoner's dilemma have been  undertaken  (Axelrod
     1987, Fogel 1993, Miller 1989) of these machines.

     Alternative   representations  of  game  players  include  CLASSIFIER
     SYSTEMs (Marimon, McGrattan and Sargent 1990, [GOLD89]), and  Neural-
     networks  (Fogel and Harrald 1994), though not necessarily with a GA.
     (Fogel  1993),  and  Fogel  and  Harrald  1994  use  an  Evolutionary
     Program).

     Other methods of evolving a population can be found in Lindgren 1991,
     Glance and Huberman 1993 and elsewhere.

     References.

     Axelrod, R. (1987) ``The Evolution  of  Strategies  in  the  Repeated
     Prisoner's Dilemma,'' in [DAVIS91]
     Miller,  J.H.  (1989)  ``The  Coevolution of Automata in the Repeated
     Prisoner's Dilemma'' Santa Fe Institute Working Paper 89-003.

     Marimon, Ramon, Ellen McGrattan and Thomas J. Sargent (1990)  ``Money
     as  a  Medium of Exchange in an Economy with Artificially Intelligent
     Agents'' Journal of Economic Dynamics and Control 14, pp. 329--373.

     Maynard-Smith, (1982) Evolution and the Theory of Games, CUP.

     Lindgren, K. (1991) ``Evolutionary Phenomena in Simple Dynamics,'' in
     [ALIFEI].

     Holland, J.H and John Miller (1990) ``Artificially Adaptive Agents in
     Economic Theory,'' American Economic Review: Papers  and  Proceedings
     of  the  103rd  Annual Meeting of the American Economics Association:
     365--370.

     Huberman, Bernado,  and  Natalie  S.  Glance  (1993)  "Diversity  and
     Collective   Action"   in   H.   Haken   and   A.   Mikhailov  (eds.)
     Interdisciplinary Approaches to Nonlinear Systems, Springer.

     Fogel (1993) "Evolving Behavior in the Iterated  Prisoner's  Dilemma"
     Evolutionary Computation 1:1, 77-97

     Fogel,  D.B.  and  Harrald, P. (1994) ``Evolving Complex Behaviour in
     the Iterated Prisoner's Dilemma,'' Proceedings of the  Fourth  Annual
     Meetings of the Evolutionary Programming Society, L.J. Fogel and A.W.
     Sebald eds., World Science Press.

     Lindgren, K. and Nordahl, M.G.  "Cooperation and Community  Structure
     in Artificial Ecosystems", Artificial Life, vol 1:1&2, 15-38

     Stanley,  E.A.,  Ashlock,  D.  and  Tesfatsion,  L.  (1994) "Iterated
     Prisoners Dilemma with Choice and Refusal of Partners  in  [ALIFEIII]
     131-178

 JOB-SHOP SCHEDULING
     The  Job-Shop  Scheduling  Problem  (JSSP)  is  a  very difficult NP-
     complete problem which, so far, seems best addressed by sophisticated
     branch  and  bound  search  techniques.  GA researchers, however, are
     continuing to make  progress  on  it.   (Davis  85)  started  off  GA
     research  on  the  JSSP,  (Whitley  89)  reports  on  using  the edge
     RECOMBINATION operator (designed initially for the TSP) on JSSPs too.
     More  recent work includes (Nakano 91),(Yamada & Nakano 92), (Fang et
     al. 93).  The latter three  report  increasingly  better  results  on
     using  GAs on fairly large benchmark JSSPs (from Muth & Thompson 63);
     neither consistently outperform branch & bound search yet,  but  seem
     well  on  the  way.  A  crucial  aspect  of such work (as with any GA
     application) is the method used to  encode  schedules.  An  important
     aspect of some of the recent work on this is that better results have
     been obtained by rejecting the conventional wisdom  of  using  binary
     representations   (as  in  (Nakano  91))  in  favor  of  more  direct
     encodings. In (Yamada & Nakano 92), for example,  a  GENOME  directly
     encodes operation completion times, while in (Fang et al. 93) genomes
     represent implicit instructions for building a schedule. The  success
     of  these  latter techniques, especially since their applications are
     very important in industry, should eventually spawn  advances  in  GA
     theory.

     Concerning  the point of using GAs at all on hard job-shop scheduling
     problems, the same goes here as suggested  above  for  `Timetabling':
     The   GA   approach  enables  relatively  arbitrary  constraints  and
     objectives to be incorporated painlessly into a  single  OPTIMIZATION
     method.   It   is  unlikely  that  GAs  will  outperform  specialized
     knowledge-based  and/or  conventional  OR-based  approaches  to  such
     problems  in  terms  of  raw solution quality, however GAs offer much
     greater simplicity and flexibility, and so, for example, may  be  the
     best method for quick high-quality solutions, rather than finding the
     best possible solution at any cost. Also, of course,  hybrid  methods
     will  have a lot to offer, and GAs are far easier to parallelize than
     typical knowledge-based/OR methods.

     Similar to the JSSP is  the  Open  Shop  Scheduling  Problem  (OSSP).
     (Fang  et  al.  93) reports an initial attempt at using GAs for this.
     Ongoing results from the same source shows  reliable  achievement  of
     results  within  less than 0.23% of optimal on moderately large OSSPs
     (so far, up to 20x20), including an  improvement  on  the  previously
     best known solution for a benchmark 10x10 OSSP. A simpler form of job
     shop problem is the Flow-Shop Sequencing problem;  recent  successful
     work on applying GAs to this includes (Reeves 93)."

     Other scheduling problems

     In  contrast  to  job  shop  scheduling  some  maintenance scheduling
     problems consider which  activities  to  schedule  within  a  planned
     maintenance  period,  rather  than seeking to minimise the total time
     taken by the activities. The constraints on which parts may be  taken
     out  of  service  for  maintenance  at  particular  times may be very
     complex, particularly as they will in general interact. Some  initial
     work is given in (Langdon, 1995).

     References

     Davis,  L.  (1985)  "Job-Shop  Scheduling  with  Genetic Algorithms",
     [ICGA85], 136-140.

     Muth, J.F. & Thompson, G.L. (1963) "Industrial Scheduling".  Prentice
     Hall, Englewood Cliffs, NJ, 1963.

     Nakano,  R.  (1991)  "Conventional  Genetic  Algorithms  for Job-Shop
     Problems", [ICGA91], 474-479.

     Reeves, C.R. (1993) "A Genetic Algorithm  for  Flowshop  Sequencing",
     Coventry Polytechnic Working Paper, Coventry, UK.

     Whitley,  D.,  Starkweather,  T.  &  D'Ann  Fuquay (1989) "Scheduling
     Problems and  Traveling  Salesmen:  The  Genetic  Edge  Recombination
     Operator", [ICGA89], 133-140.

     Fang,  H.-L.,  Ross,  P.,  &  Corne  D.  (1993)  "A Promising Genetic
     Algorithm Approach to Job-Shop Scheduling, Rescheduling  &  Open-Shop
     Scheduling Problems", [ICGA93], 375-382.

     Yamada,  T.  &  Nakano,  R. (1992) "A Genetic Algorithm Applicable to
     Large-Scale Job-Shop Problems", [PPSN92], 281-290.

     Langdon, W.B. (1995) "Scheduling  Planned  Maintenance  of  the  (UK)
     National Grid", cs.ucl.ac.uk:/genetic/papers/grid_aisb-95.ps

 MANAGEMENT SCIENCES
     "Applications  of EA in management science and closely related fields
     like organizational ecology is a domain that has been covered by some
     EA  researchers - with considerable bias towards scheduling problems.
     Since I believe that EA have considerable potential for  applications
     outside   the   rather   narrow  domain  of  scheduling  and  related
     combinatorial problems, I started  collecting  references  about  the
     status  quo  of  EA-applications  in management science.  This report
     intends to make available my findings to  other  researchers  in  the
     field.  It  is  a  short  overview  and  lists some 230 references to
     current as well as finished research projects.  [..]

     "At the end of the paper, a questionnaire has been incorporated  that
     may be used for this purpose. Other comments are also appreciated."

     --- from the Introduction of (Nissen 93)

     References

     Nissen,  V. (1993) "Evolutionary Algorithms in Management Science: An
     Overview and List of References", Papers on Economics and  Evolution,
     edited  by the European Study Group for Evolutionary Economics.  This
     report     is     also     avail.     via     anon.      FTP     from
     ftp.gwdg.de:/pub/msdos/reports/wi/earef.eps

     Boulding,  K.E.  (1991) "What is evolutionary economics?", Journal of
     Evolutionary Economics, 1, 9-17.

 TIMETABLING
     This has been addressed quite successfully with GAs.  A  very  common
     manifestation  of this kind of problem is the timetabling of exams or
     classes in Universities, etc.

     The first application of GAs to the timetabling problem was to  build
     the  schedule  of  the  teachers  in  an  Italian  high  school.  The
     research, conducted at the Department of Electronics and  Information
     of Politecnico di Milano, Italy, showed that a GA was as good as Tabu
     Search, and better  than  simulated  annealing,  at  finding  teacher
     schedules  satisfying  a  number  of  hard and soft constraints.  The
     software package developed is now in current use in some high schools
     in Milano. (Colorni et al 1990)

     At   the   Department   of  Artificial  Intelligence,  University  of
     Edinburgh, timetabling the MSc exams is now done using a GA (Corne et
     al.  93,  Fang  92).  An  example  of  the use of GAs for timetabling
     classes is (Abramson & Abela 1991).

     In the exam timetabling case,  the  FITNESS  function  for  a  GENOME
     representing a timetable involves computing degrees of punishment for
     various problems with the timetable, such as  clashes,  instances  of
     students  having  to  take  consecutive  exams, instances of students
     having (eg) three or more exams in  one  day,  the  degree  to  which
     heavily-subscribed  exams  occur  late  in the timetable (which makes
     marking harder), overall length of timetable, etc. The modular nature
     of the fitness function has the key to the main potential strength of
     using GAs for this sort of thing as  opposed  to  using  conventional
     search  and/or  constraint  programming  methods. The power of the GA
     approach is the ease with which it  can  handle  arbitrary  kinds  of
     constraints  and  objectives;  all  such  things  can  be  handled as
     weighted components of the fitness function, making it easy to  adapt
     the  GA  to  the  particular  requirements  of  a  very wide range of
     possible overall objectives . Very few other timetabling methods, for
     example,  deal with such objectives at all, which shows how difficult
     it is (without  GAs)  to  graft  the  capacity  to  handle  arbitrary
     objectives  onto  the  basic "find shortest- length timetable with no
     clashes" requirement.  The  proper  way  to  weight/handle  different
     objectives  in  the  fitness  function  in relation to the general GA
     dynamics remains, however, an important research problem!

     GAs thus offer a combination of simplicity, flexibility & speed which
     competes  very  favorably  with other approaches, but are unlikely to
     outperform  knowledge-based  (etc)  methods  if  the  best   possible
     solution  is  required at any cost. Even then, however, hybridisation
     may yield the best of both worlds; also, the ease (if the hardware is
     available!)  of implementing GAs in parallel enhances the possibility
     of using them for good, fast solutions to very hard  timetabling  and
     similar problems.

     References
     Abramson & Abela (1991) "A Parallel Genetic Algorithm for Solving the
     School Timetabling Problem",  Technical  Report,  Division  of  I.T.,
     C.S.I.R.O,   April   1991.    (Division  of  Information  Technology,
     C.S.I.R.O., c/o Dept.  of  Communication  &  Electronic  Engineering,
     Royal  Melbourne  Institute  of  Technology,  PO BOX 2476V, Melbourne
     3001, Australia)

     Colorni A., M. Dorigo & V. Maniezzo (1990).  Genetic  Algorithms  And
     Highly  Constrained Problems: The Time-Table Case. Proceedings of the
     First International Workshop on Parallel Problem Solving from Nature,
     Dortmund,  Germany,  Lecture Notes in Computer Science 496, Springer-
     Verlag,                                                        55-59.
     http://iridia.ulb.ac.be/dorigo/dorigo/conferences/IC.01-PPSN1.ps.gz

     Colorni  A.,  M.  Dorigo & V. Maniezzo (1990).  Genetic Algorithms: A
     New Approach to the Time-Table Problem. NATO ASI  Series,  Vol.F  82,
     COMBINATORIAL  OPTIMIZATION,  (Ed.  M.Akguel  and  others), Springer-
     Verlag,                                                      235-239.
     http://iridia.ulb.ac.be/dorigo/dorigo/conferences/IC.02-NATOASI90.ps.gz

     Colorni A., M. Dorigo & V. Maniezzo (1990).  A Genetic  Algorithm  to
     Solve   the   Timetable   Problem.    Technical  Report  No.  90-060,
     Politecnico              di              Milano,               Italy.
     http://iridia.ulb.ac.be/dorigo/dorigo/tec.reps/TR.01-TTP.ps.gz

     Corne,  D. Fang, H.-L. & Mellish, C. (1993) "Solving the Modular Exam
     Scheduling Problem with Genetic  Algorithms".   Proc.  of  6th  Int'l
     Conf.  on  Industrial  and  Engineering  Applications  of  Artificial
     Intelligence & Expert Systems, ISAI.

     Fang,  H.-L.  (1992)  "Investigating   GAs   for   scheduling",   MSc
     Dissertation,   University   of   Edinburgh   Dept.   of   Artificial
     Intelligence, Edinburgh, UK.

------------------------------

Subject: Q3: Who is concerned with EAs?

     EVOLUTIONARY COMPUTATION attracts researchers  and  people  of  quite
     dissimilar  disciplines,  i.e.   EC  is  a interdisciplinary research
     field:

 Computer scientists
     Want to find out about the  properties  of  sub-symbolic  information
     processing  with  EAs  and  about learning, i.e.  adaptive systems in
     general.

     They  also  build  the  hardware  necessary  to  enable  future   EAs
     (precursors  are  already  beginning  to  emerge)  to huge real world
     problems, i.e. the term "massively parallel computation"  [HILLIS92],
     springs to mind.

 Engineers
     Of  many  kinds want to exploit the capabilities of EAs on many areas
     to solve their application, esp.  OPTIMIZATION problems.

 Roboticists
     Want to build MOBOTs (MOBile ROBOTs, i.e. R2D2's  and  #5's  cousins)
     that  navigate through uncertain ENVIRONMENTs, without using built-in
     "maps".  The MOBOTS thus have to adapt  to  their  surroundings,  and
     learn what they can do "move-through-door" and what they can't "move-
     through-wall" on their own by "trial-and-error".

 Cognitive scientists
     Might view CFS as a possible apparatus to describe models of thinking
     and cognitive systems.
 Physicists
     Use  EC  hardware, e.g. Hillis' (Thinking Machine Corp.'s) Connection
     Machine to model real  world  problems  which  include  thousands  of
     variables, that run "naturally" in parallel, and thus can be modelled
     more easily and esp.  "faster" on  a  parallel  machine,  than  on  a
     serial "PC" one.

 Biologists
     Are  finding  EAs  useful  when it comes to protein folding and other
     such bio-computational problems (see Q2).

     EAs can also be used to model the behaviour of  real  POPULATIONs  of
     organisms.   Some  biologists  are hostile to modeling, but an entire
     community of  Population  Biologists  arose  with  the  'evolutionary
     synthesis' of the 1930's created by the polymaths R.A. Fisher, J.B.S.
     Haldane, and S. Wright.  Wright's  SELECTION  in  small  populations,
     thereby  avoiding  local  optima)  is  of  current  interest  to both
     biologists and ECers -- populations are naturally parallel.

     A good exposition  of  current  population  Biology  modeling  is  J.
     Maynard Smith's text Evolutionary Genetics.  Richard Dawkin's Selfish
     Gene and Extended Phenotype are unparalleled (sic!) prose expositions
     of   evolutionary  processes.   Rob  Collins'  papers  are  excellent
     parallel GA models of evolutionary processes (available  in  [ICGA91]
     and by FTP from ftp.cognet.ucla.edu:/pub/alife/papers/ ).

     As  fundamental  motivation, consider Fisher's comment: "No practical
     biologist interested in (e.g.) sexual REPRODUCTION would  be  led  to
     work  out  the  detailed consequences experienced by organisms having
     three or more sexes; yet what else should [s/]he do if [s/]he  wishes
     to understand why the sexes are, in fact, always
      two?"   (Three  sexes  would  make  for even weirder grammar, [s/]he
     said...)

 Chemists
     And in particular biochemists and molecular chemists, are  interested
     in problems such as the conformational analysis of molecular clusters
     and related problems in molecular sciences.  The application  of  GAs
     to  molecular  systems has opened an interesting area of research and
     the number of chemists involved in it increases day-by-day.

     Some typical research topics include:

     o  protein   folding;   o   conformational   analysis   and    energy
	minimization; o docking algorithms for drug-design; o solvent site
	prediction in macromolecules;
     Several papers have been published in journals  such  as  Journal  of
     Computational Chemistry and Journal of Computer-Aided Design.

     Some  interesting  WWW  sites  related  to the applications of GAs to
     chemistry (or molecular science in general) include:

     o  http://isl.msu.edu/GA/projects/biochem/biochem.html about  GAs  in
	biochemistry  (water  site  prediction,  drug-design  and  protein
	folding);                                                        o
	http://www.tc.cornell.edu/Edu/SPUR/SPUR94/Main/John.html about the
	application of GAs to the search of conformational energy  minima;
	o  http://cmp.ameslab.gov/cmp/CMP_Theory/gsa/gen2.html  By using a
	GA in combiation with a Tight-binding model, David Deaven and Kai-
	Ming  Ho  founded  fullerene  cages  (including C60) starting from
	random coordinates.
     See also Q2 for applications in biocomputing.

 Philosophers
     and some other really curious people may also be interested in EC for
     various reasons.

------------------------------

Subject: Q4: How many EAs exist? Which?

 The All Stars
     There  are  currently  3  main  paradigms  in  EA  research:  GENETIC
     ALGORITHMs,  EVOLUTIONARY  PROGRAMMING,  and  EVOLUTION   STRATEGIEs.
     CLASSIFIER  SYSTEMs  and  GENETIC PROGRAMMING are OFFSPRING of the GA
     community.  Besides this  leading  crop,  there  are  numerous  other
     different  approaches, alongside hybrid experiments, i.e. there exist
     pieces of software residing in some researchers computers, that  have
     been  described  in papers in conference proceedings, and may someday
     prove useful on certain tasks. To stay in EA slang, we  should  think
     of  these  evolving  strands as BUILDING BLOCKs, that when recombined
     someday, will  produce  new  offspring  and  give  birth  to  new  EA
     paradigm(s).

 Promising Rookies
     As  far  as  "solving complex function and COMBINATORIAL OPTIMIZATION
     tasks" is concerned, Davis' work on real-valued  representations  and
     adaptive operators should be mentioned (Davis 89). Moreover Whitley's
     Genitor system incorporating ranking  and  "steady  state"  mechanism
     (Whitley    89),    Goldberg's   "messy   GAs",   involves   adaptive
     representations (Goldberg 91), and Eshelman's CHC algorithm (Eshelman
     91).   For  real  FUNCTION OPTIMIZATION, Differential EVOLUTION seems
     hard to beat in terms of convergence speed  as  well  as  simplicity:
     With just three control variables, tuning is particularly easy to do.

     For  "the  design  of  robust  learning  systems",  i.e.  the   field
     characterized  by  CFS, Holland's (1986) CLASSIFIER SYSTEM, with it's
     state-of-the-art implementation CFS-C  (Riolo  88),  we  should  note
     recent  developments  in  SAMUEL  (Grefenstette 89), GABIL (De Jong &
     Spears 91), and GIL (Janikow 91).

     References

     Davis,  L.  (1989)  "Adapting  operator  probabilities   in   genetic
     algorithms", [ICGA89], 60-69.

     De  Jong  K.A.  &  Spears  W. (1991) "Learning concept classification
     rules using genetic algorithms". Proc. 12th IJCAI,  651-656,  Sydney,
     Australia: Morgan Kaufmann.

     Dorigo  M.  &  E.  Sirtori (1991)."ALECSYS: A Parallel Laboratory for
     Learning Classifier Systems". Proceedings of the Fourth International
     Conference  on  Genetic  Algorithms, San Diego, California, R.K.Belew
     and L.B.Booker (Eds.), Morgan Kaufmann, 296-302.

     Dorigo M. (1995). "ALECSYS and the AutonoMouse: Learning to Control a
     Real  Robot by Distributed Classifier Systems". Machine Learning, 19,
     3, 209-240.

     Eshelman, L.J. et al. (1991)  "Preventing  premature  convergence  in
     genetic algorithms by preventing incest", [ICGA91], 115-122.

     Goldberg,  D. et al. (1991) "Don't worry, be messy", [ICGA91], 24-30.

     Grefenstette, J.J. (1989) "A system for learning  control  strategies
     with genetic algorithms", [ICGA89], 183-190.

     Holland,  J.H.  (1986)  "Escaping  brittleness:  The possibilities of
     general-purpose learning algorithms applied  to  parallel  rule-based
     systems".   In R. Michalski, J. Carbonell, T. Mitchell (eds), Machine
     Learning: An Artificial  Intelligence  Approach.  Los  Altos:  Morgan
     Kaufmann.
     Janikow   C.  (1991)  "Inductive  learning  of  decision  rules  from
     attribute-based examples:  A  knowledge-intensive  Genetic  Algorithm
     approach". TR91-030, The University of North Carolina at Chapel Hill,
     Dept. of Computer Science, Chapel Hill, NC.

     Riolo,  R.L.  (1988)  "CFS-C:  A  package   of   domain   independent
     subroutines  for  implementing classifier systems in arbitrary, user-
     defined  environments".   Logic  of  computers  group,  Division   of
     computer science and engineering, University of Michigan.

     Whitley,  D.  et  al.  (1989)  "The  GENITOR  algorithm and selection
     pressure: why rank-based allocation of reproductive trials is  best",
     [ICGA89], 116-121.

------------------------------

Subject: Q4.1: What about Alife systems, like Tierra and VENUS?

     None  of  these  are Evolutionary Algorithms, but all of them use the
     evolutionary metaphor as their "playing field".

 Tierra
     Synthetic organisms have been created based on a computer metaphor of
     organic  life in which CPU time is the ``energy'' resource and memory
     is the ``material'' resource.  Memory is organized into informational
     patterns  that  exploit  CPU  time  for  self-replication.   MUTATION
     generates new forms, and EVOLUTION proceeds by natural  SELECTION  as
     different GENOTYPEs compete for CPU time and memory space.

     Observation  of  nature  shows that evolution by natural selection is
     capable of both OPTIMIZATION and creativity.   Artificial  models  of
     evolution  have  demonstrated the optimizing ability of evolution, as
     exemplified by the field of GENETIC ALGORITHMs.  The creative aspects
     of evolution have been more elusive to model.  The difficulty derives
     in part from a tendency of models  to  specify  the  meaning  of  the
     ``genome''  of  the  evolving  entities, precluding new meanings from
     emerging.  I will present a natural model of evolution  demonstrating
     both  optimization  and  creativity,  in which the GENOME consists of
     sequences of executable machine code.

     From a single rudimentary ancestral ``creature'', very quickly  there
     evolve  parasites,  which  are  not  able  to  replicate in isolation
     because they lack a large portion  of  the  genome.   However,  these
     parasites  search  for the missing information, and if they locate it
     in a nearby creature, parasitize the information from the neighboring
     genome, thereby effecting their own replication.

     In  some  runs,  hosts  evolve immunity to attack by parasites.  When
     immune hosts appear, they often increase  in  frequency,  devastating
     the  parasite POPULATIONs.  In some runs where the community comes to
     be dominated by immune hosts, parasites evolve that are resistant  to
     immunity.

     Hosts  sometimes  evolve  a  response  to  parasites that goes beyond
     immunity,  to  actual  (facultative)  hyper-parasitism.   The  hyper-
     parasite  deceives  the  parasite  causing the parasite to devote its
     energetic resources to  replication  of  the  hyper-parastie  genome.
     This  drives the parasites to extinction.  Evolving in the absence of
     parasites,  hyper-parasites  completely   dominate   the   community,
     resulting  in  a relatively uniform community characterized by a high
     degree   of   relationship   between   INDIVIDUALs.    Under    these
     circumstances,  sociality evolves, in the form of creatures which can
     only replicate in aggregations.

     The cooperative behavior of the  social  hyper-parasites  makes  them
     vulnerable to a new class of parasites.  These cheaters, hyper-hyper-
     parasites, insert themselves between cooperating social  individuals,
     deceiving the social creatures, causing them to replicate the genomes
     of the cheaters.

     The only genetic change imposed on the simulator is random bit  flips
     in  the  machine  code  of the creatures.  However, it turns out that
     parasites  are  very  sloppy  replicators.   They  cause  significant
     RECOMBINATION  and  rearrangement  of  the genomes.  This spontaneous
     sexuality is a powerful force for evolutionary change in the  system.

     One  of the most interesting aspects of this instance of life is that
     the bulk of the evolution  is  based  on  adaptation  to  the  biotic
     ENVIRONMENT rather than the physical environment.  It is co-evolution
     that drives the system.

     --- "Tierra announcement" by Tom Ray (1991)

  How to get Tierra?
     The complete source code and documentation (but not  executables)  is
     available   by   anonymous   FTP   at:   tierra.slhs.udel.edu:/   and
     life.slhs.udel.edu:/ in the directories: almond/, beagle/, doc/,  and
     tierra/.

     If you do not have FTP access you may obtain everything on DOS disks.
     For details, write to: Virtual Life, 25631 Jorgensen Rd., Newman,  CA
     95360.

     References

     Ray, T. S. (1991)  "Is it alive, or is it GA?" in [ICGA91], 527--534.

     Ray, T. S. (1991)   "An  approach  to  the  synthesis  of  life."  in
     [ALIFEII], 371--408.

     Ray,  T.  S.   (1991)  "Population dynamics of digital organisms." in
     [ALIFEII].

     Ray,  T.  S.   (1991)   "Evolution  and   optimization   of   digital
     organisms."   Scientific  Excellence  in Supercomputing: The IBM 1990
     Contest Prize Papers, Eds. Keith R. Billingsley, Ed Derohanes, Hilton
     Brown,  III.  Athens, GA, 30602, The Baldwin Press, The University of
     Georgia.

     Ray, T. S.  (1992) "Evolution, ecology and  optimization  of  digital
     organisms."  Santa Fe Institute working paper 92-08-042.

     Ray, T. S.  "Evolution, complexity, entropy, and artificial reality."
     submitted Physica D. Avail. as tierra.slhs.udel.edu:/doc/PhysicaD.tex

     Ray,  T.  S.   (1993) "An evolutionary approach to synthetic biology,
     Zen and the art of creating life.  Artificial Life  1(1).  Avail.  as
     tierra.slhs.udel.edu:/doc/Zen.tex

 VENUS
     Steen  Rasmussen's  (et  al.) VENUS I+II "coreworlds" as described in
     [ALIFEII] and [LEVY92], are inspired  by  A.K.  Dewdney's  well-known
     article  (Dewdney  1984). Dewdney proposed a game called "Core Wars",
     in which hackers create computer programs that battle for control  of
     a  computer's "core" memory (Strack 93).  Since computer programs are
     just patterns of information, a successful program in  core  wars  is
     one that replicates its pattern within the memory, so that eventually
     most of the memory contains its  pattern  rather  than  that  of  the
     competing program.

     VENUS  is  a modification of Core Wars in which the Computer programs
     can mutate, thus the pseudo assembler code creatures of VENUS  evolve
     steadily.   Furthermore   each   memory   location  is  endowed  with
     "resources" which, like sunshine are added  at  a  steady  state.   A
     program  must  have  sufficient resources in the regions of memory it
     occupies in order to execute.   The  input  of  resources  determines
     whether  the  VENUS ecosystem is a "jungle" or a "desert."  In jungle
     ENVIRONMENTs, Rasmussen et al. observe the spontaneous  emergence  of
     primitive  "copy/split"  organisms  starting from (structured) random
     initial conditions.

     --- [ALIFEII], p.821

     Dewdney, A.K. (1984) "Computer Recreations: In the Game  called  Core
     War  Hostile Programs Engage in a Battle of Bits", Sci. Amer. 250(5),
     14-22.

     Farmer &  Belin  (1992)  "Artificial  Life:  The  Coming  Evolution",
     [ALIFEII], 815-840.

     Rasmussen,  et  al. (1990) "The Coreworld: Emergence and Evolution of
     Cooperative Structures in a  Computational  Chemistry",  [FORREST90],
     111-134.

     Rasmussen,   et   al.   (1992)  "Dynamics  of  Programmable  Matter",
     [ALIFEII], 211-254.

     Strack   (1993)   "Core   War   Frequently    Asked    Questions    (
     rec.games.corewar     FAQ)"     Avail.    by    anon.     FTP    from
     rtfm.mit.edu:/pub/usenet/news.answers/games/corewar-faq.Z

 PolyWorld
     Larry Yaeger's PolyWorld as described in [ALIFEIII] and  [LEVY92]  is
     available via anonymous FTP from ftp.apple.com:/pub/polyworld/

     "The  subdirectories in this "polyworld" area contain the source code
     for the PolyWorld ecological simulator, designed and written by Larry
     Yaeger, and Copyright 1990, 1991, 1992 by Apple Computer.

     PostScript  versions  of  my ARTIFICIAL LIFE III technical paper have
     now been added to the directory.  These should be directly  printable
     from most machines.  Because some unix systems' "lpr" commands cannot
     handle very large files (ours at least), I have split the paper  into
     Yaeger.ALife3.1.ps and Yaeger.ALife3.2.ps.  These files can be ftp-ed
     in "ascii" mode.  For unix users  I  have  also  included  compressed
     versions  of  both these files (indicated by the .Z suffix), but have
     left the uncompressed versions around for people connecting from non-
     unix  systems.   I  have  not  generated  PostScript  versions of the
     images, because they are color and the resulting files are  much  too
     large  to  store,  retrieve,  or  print.   Accordingly, though I have
     removed a Word-formatted version of the textual  body  of  the  paper
     that  used  to  be  here, I have left a Word-formatted version of the
     color images.  If you wish to acquire it, you will need  to  use  the
     binary transfer mode to move it to first your unix host and then to a
     Macintosh (unless Word on a PC can read it - I don't know),  and  you
     may  need to do something nasty like use ResEdit to set the file type
     and creator to match those of a standard Word document (Type =  WDBN,
     Creator = MSWD).  [..]"

     --- from the README by Larry Yaeger <larryy@apple.com>

 General Alife repositories?
     Also, all of the following FTP sites carry ALIFE related info:

     ftp.cognet.ucla.edu:/pub/alife/                                     ,
     life.anu.edu.au:/pub/complex_systems/alife/                         ,
     ftp.cogs.susx.ac.uk:/pub/reports/csrp/  ,  xyz.lanl.gov:/nlin-sys/  ,
     alife.santafe.edu:/pub/ .
------------------------------

Subject: Q5: What about all this Optimization stuff?

     Just think of an OPTIMIZATION problem as a black box.  A large  black
     box.  As  large as, for example, a Coca-Cola vending machine. Now, we
     don't know anything about the inner workings of this  box,  but  see,
     that  there  are some regulators to play with, and of course we know,
     that we want to have a bottle of the real thing...

     Putting this everyday problem into a mathematical model,  we  proceed
     as follows:

     (1) we  label all the regulators with x and a number starting from 1;
	 the result is a vector x, i.e.  (x_1,...,x_n),  where  n  is  the
	 number of visible regulators.

     (2) we must find an objective function, in this case it's obvious, we
	 want to get k bottles of the real thing, where k is equal  to  1.
	 [some  might  want  a  "greater or equal" here, but we restricted
	 ourselves to the visible regulators (we all know that sometimes a
	 "kick  in  the  right  place" gets use more than 1, but I have no
	 idea how to put this mathematically...)]

     (3) thus, in the language some mathematicians  prefer  to  speak  in:
	 f(x)  =  k  =  1. So, what we have here is a maximization problem
	 presented in a form we know from some  boring  calculus  lessons,
	 and   we   also   know  that  there  at  least  a  dozen  utterly
	 uninteresting techniques to solve problems presented this  way...

 What can we do in order to solve this problem?
     We  can  either try to gain more knowledge or exploit what we already
     know about the interior of the black box. If the  objective  function
     turns  out  to  be smooth and differentiable, analytical methods will
     produce the exact solution.

     If this turns out to be impossible, we  might  resort  to  the  brute
     force  method  of  enumerating the entire SEARCH SPACE.  But with the
     number of possibilities growing exponentially in  n,  the  number  of
     dimensions  (inputs),  this  method  becomes infeasible even for low-
     dimensional spaces.

     Consequently, mathematicians  have  developed  theories  for  certain
     kinds  of  problems  leading  to specialized OPTIMIZATION procedures.
     These  algorithms  perform  well  if  the  black  box  fulfils  their
     respective  prerequisites.   For example, Dantzig's simplex algorithm
     (Dantzig 66) probably  represents  the  best  known  multidimensional
     method capable of efficiently finding the global optimum of a linear,
     hence convex, objective function in a search space limited by  linear
     constraints.   (A  USENET  FAQ on linear programming is maintained by
     John W. Gregory of Cray Research, Inc.  Try  to  get  your  hands  on
     "linear-programming-faq"  (and  "nonlinear-programming-faq")  that is
     posted monthly to sci.op-research and is mostly interesting to read.)

     Gradient  strategies  are  no longer tied to these linear worlds, but
     they smooth their world by exploiting the objective function's  first
     partial  derivatives  one  has to supply in advance. Therefore, these
     algorithms rely on a locally linear internal model of the black  box.

     Newton   strategies   additionally   require   the   second   partial
     derivatives, thus building a quadratic internal model.  Quasi-Newton,
     conjugate  gradient  and  variable metric strategies approximate this
     information during the search.

     The deterministic  strategies  mentioned  so  far  cannot  cope  with
     deteriorations,  so  the search will stop if anticipated improvements
     no longer occur. In a multimodal ENVIRONMENT  these  algorithms  move
     "uphill"  from their respective starting points. Hence, they can only
     converge to the next local optimum.

     Newton-Raphson-methods might even diverge if  a  discrepancy  between
     their  internal assumptions and reality occurs.  But of course, these
     methods turn out to  be  superior  if  a  given  task  matches  their
     requirements.  Not relying on derivatives, polyeder strategy, pattern
     search and rotating coordinate search should also be  mentioned  here
     because  they  represent  robust  non-linear  optimization algorithms
     (Schwefel 81).

     Dealing with technical optimization problems, one will rarely be able
     to write down the objective function in a closed form.  We often need
     a SIMULATION model in order to grasp reality.  In general, one cannot
     even   expect   these   models   to  behave  smoothly.  Consequently,
     derivatives do not exist. That is why  optimization  algorithms  that
     can  successfully  deal  with  black  box-type  situations  have been
     developed. The increasing applicability is of course paid  for  by  a
     loss  of  "convergence  velocity,"  compared  to algorithms specially
     designed for the given problem.  Furthermore, the guarantee  to  find
     the global optimum no longer exists!

 But why turn to nature when looking for more powerful algorithms?
     In  the  attempt  to  create  tools for various purposes, mankind has
     copied, more often instinctively than geniously,  solutions  invented
     by  nature.  Nowadays, one can prove in some cases that certain forms
     or structures are not only well adapted to their ENVIRONMENT but have
     even reached the optimum (Rosen 67). This is due to the fact that the
     laws of nature have remained  stable  during  the  last  3.5  billion
     years.  For  instance,  at branching points the measured ratio of the
     diameters in a system of blood-vessels comes close to the theoretical
     optimum  provided  by  the laws of fluid dynamics (2^-1/3).  This, of
     course, only represents a  limited,  engineering  point  of  view  on
     nature. In general, nature performs adaptation, not optimization.

     The idea to imitate basic principles of natural processes for optimum
     seeking procedures emerged more than three decades  ago  (cf  Q10.3).
     Although  these  algorithms  have  proven  to  be  robust  and direct
     OPTIMIZATION tools, it is only in the last five years that they  have
     caught  the researchers' attention. This is due to the fact that many
     people still look at organic EVOLUTION as a giantsized game of  dice,
     thus  ignoring  the  fact  that  this  model of evolution cannot have
     worked: a human germ-cell comprises approximately 50,000 GENEs,  each
     of  which  consists  of about 300 triplets of nucleic bases. Although
     the four  existing  bases  only  encode  20  different  amino  acids,
     20^15,000,000,  ie  circa 10^19,500,000 different GENOTYPEs had to be
     tested in only circa 10^17 seconds, the age of our planet. So, simply
     rolling  the  dice  could  not have produced the diversity of today's
     complex living systems.

     Accordingly,  taking  random  samples   from   the   high-dimensional
     parameter  space  of an objective function in order to hit the global
     optimum must fail (Monte-Carlo search). But  by  looking  at  organic
     evolution  as  a  cumulative,  highly  parallel  sieving process, the
     results of which pass on slightly modified into the next  sieve,  the
     amazing   diversity   and  efficiency  on  earth  no  longer  appears
     miraculous. When building a model, the point is to isolate  the  main
     mechanisms  which  have  led  to  today's  world  and which have been
     subjected to evolution themselves.  Inevitably, nature  has  come  up
     with  a  mechanism  allowing  INDIVIDUALs  of one SPECIES to exchange
     parts of their genetic information (RECOMBINATION or CROSSOVER), thus
     being able to meet changing environmental conditions in a better way.

     Dantzig, G.B.  (1966)  "Lineare  Programmierung  und  Erweiterungen",
     Berlin: Springer. (Linear programming and extensions)
     Kursawe,  F.  (1994) " Evolution strategies: Simple models of natural
     processes?", Revue Internationale de Systemique, France (to  appear).

     Rosen,   R.  (1967)  "Optimality  Principles  in  Biologie",  London:
     Butterworth.

     Schwefel, H.-P. (1981) "Numerical Optimization of  Computer  Models",
     Chichester: Wiley.

------------------------------

     Copyright  (c) 1993-1996 by J. Heitkoetter and D. Beasley, all rights
     reserved.

     This FAQ may be posted to any USENET newsgroup, on-line  service,  or
     BBS  as  long  as  it  is  posted  in  its entirety and includes this
     copyright statement.  This FAQ may not be distributed  for  financial
     gain.   This  FAQ  may  not  be included in commercial collections or
     compilations without express permission from the author.

End of ai-faq/genetic/part3
***************************

