
Genetic Algorithms Digest   Thursday, August 1 1991   Volume 5 : Issue 19

 - Send submissions to GA-List@AIC.NRL.NAVY.MIL
 - Send administrative requests to GA-List-Request@AIC.NRL.NAVY.MIL

Today's Topics:
	- Building Block Hypothesis Considered Harmful
	- Pattern Recognition and GAs, info wanted
	- Re: SPLICER
	- Re: Large TSP problems; ftp site info
	- grammatical inference and GAs; request for references

**********************************************************************

CALENDAR OF GA-RELATED ACTIVITIES: (with GA-List issue reference)

 IJCAI 91, International Joint Conference on AI, Sydney, AU   Aug 25-30, 1991
 First European Conference on Artificial Life (v5n10)         Dec 11-13, 1991
 ECAI 92, 10th European Conference on AI (v5n13)              Aug  3-7,  1992
 Parallel Problem Solving from Nature, Brussels, (CFP 4/15)   Sep 28-30, 1992
 10th National Conference on AI, San Jose, (CFP 1/15)         Jul 12-17, 1992
 Canadian AI Conference, Vancouver, (CFP 1/7)                 May 11-15, 1992

 (Send announcements of other activities to GA-List@aic.nrl.navy.mil)

 **********************************************************************
 ----------------------------------------------------------------------

From: gref@AIC.NRL.Navy.Mil
Date: Thu, 1 Aug 91 11:14:09 EDT
Subject: Building Block Hypothesis Considered Harmful

   I thought the Conference was a great success, and I found many of the
   papers very stimulating.  I didn't get a chance to visit the Theory
   Workshop since I was running my own at the time, but many of the theory
   papers I saw presented and those I read later reminded me of the old
   story:

      There once was a philosopher looking for his keys beneath a lamp post.
      A friend came along and started to help.  After a while, the friend
      asked, "Are you sure you lost your keys here?" The philosopher said,
      "Oh no, I lost them way over there," pointing off into the darkness.
      "Why are we looking here?"  asked the friend.  "Because the light is
      so much better here," said the philosopher.

   This fable seems to apply to most of the work on GA Theory.  At the
   heart of the problem is a proposition which I'll call:

   The Strong Building Block Hypothesis (SBBH): GAs proceed by finding low
   order schemas with the best average payoff in each hyperplane partition
   and using these to build up more complete solutions.

   If you believe the SBBH, then you probably believe that functions for
   which the schemas associated with the optimum are the best schemas in
   their partitions ought to be easy for GAs.  For example, suppose the
   true optimum is 000 ... 0, and f(0#...#) > f(1#...#) and f(00#...#) is
   better than 10#...# and 01#...# and 11#...#, and so on.  Then the SBBH
   says that a GA should find f simple to optimize.  In his ICGA-91 paper,
   Stewart Wilson calls such functions "GA-easy" (though it's not clear
   that Stewart believes the SBBH).  Conversely, suppose the true optimum
   is 000 ... 0, and f(0#...#) < f(1#...#) and f(00#...#) is worse than
   10#...# and 01#...# and 11#...#, and so on.  Then this function would be
   called "GA-Hard" (Bethke, 1981) or "Deceptive" (Goldberg, 1987).

   The SBBH is an intuitively appealing explanation for how GAs work but it
   can't serve as the basis for a useful theory of GAs, since it is based
   on a static analysis of the payoff functions rather than on the dynamic
   view of the payoff function that the GA takes.  Note that Holland's
   Schema Theorem refers only to the OBSERVED average payoff of schemas,
   where OBSERVED means "according to the current sample in the
   population."  The SBBH arises when we ignore this crucial feature of the
   Schema Theorem.  Unfortunately, small changes in a theorem generally
   have non-linear effects: the SBBH is false.  There are at least three
   factors that contribute to its failure in practice:

   1. Biased sampling due to previous convergence.
   2. Limited population size.
   3. Large variance within schemas.

   I think that the first factor is the most interesting, since it arises
   even with a huge population and even if the variance within individual
   schemas are not large (but greater than 0).  Once the GA begins to
   converge, even a little, it can no longer estimate the true average
   payoff of the schemas.  It can only estimate the CONDITIONAL average
   payoffs, conditioned by the converged alleles.  That is, after the very
   FIRST generation, the population represents a heavily biased sample of
   all schemas.  However, it is very hard to think about conditional
   estimates of payoffs, and so many theorists in effect limit their
   attention to what happens at generation 0, because that's where the
   light is.  This is a pretty uninteresting period in the lifetime of a
   GA.

   It is easy to define problems that are "Deceptive" in the sense implied
   by the SBBH, but are in fact easy for the GA to optimize.  Here is one
   example, designed just by considering the first factor above (no Walsh
   coefficients, etc.): Consider a 10-bit space representing the interval
   [0 .. 1] in binary encoding.  That is, 0000000000 represents 0.0 and
   1111111111 represents 1.0.  We want to maximize f.  Let f be defined:

	   f(x) = x*x, except for the following special cases:

	   f(0111111111) = 1.01
	   f(0011111111) = 1.02
	   f(0001111111) = 1.03
	   f(0000111111) = 1.04
	   f(0000011111) = 1.05
	   f(0000001111) = 1.06
	   f(0000000111) = 1.07
	   f(0000000011) = 1.08
	   f(0000000001) = 1.09
	   f(0000000000) = 1.10

   This is nearly as "Deceptive" as you can get: an enumeration of all
   schema competitions shows that except for 0000000000 and 000000000#,
   every schema representing the optimum has a true average payoff less
   than the competing schema representing 1111111111.  Yet, my vanilla
   GENESIS program with population size 100 and the default parameter
   settings finds the optimum after a few thousand trials.

   The explanation is that the special cases don't carry enough weight to
   prevent a rapid convergence toward the suboptimal 1111111111, but once
   this occurs, the special cases (introduced by mutation at 0.001) each
   arise and propagate, one at a time.  This shows an extreme case of how
   the population convergence shifts the competition in favor of some
   schemas by eliminating their competitors.  (I'll admit that one change
   to GENESIS was necessary: I disabled the code that causes an abort after
   nearly converging.)

   By the way, this example scales up nicely.  GENESIS solves the 20-bit
   version in less than 25000 trials.

   Let's discuss the interaction of the last 2 factors.  With a limited
   population size and large variance within the schemas, even the sampling
   in the initial, random population will produce errors in the estimate of
   each schema's payoff.

   Here is a simple function that uses this observation to demonstrate that
   "GA-easy" functions are not GA-optimizable in general: Consider a 10-bit
   space representing the interval [0 .. 1] in binary encoding.  Let f be
   defined:

	   f(x) = x*x if x > 0,
	   f(0) = 2048

   For any schema S such that the optimum is in S (that is, all the defined
   positions of S have value 0), f(S) > 2 (since the sum of the payoff is
   at least 2048 and there are at most 1024 points in the hyperplane).  For
   any schema S such that the optimum is not S, f(S) <= 1.  So this is
   "GA-easy", right?

   Run your favorite GA on f with a population of size 100.  If the optimum
   is not in the initial population, it will probably never be found. (Of
   course, it might be created by a lucky crossover or a very lucky
   multiple mutation.)  Why is this hard?  Because the best schemas have
   extremely high variance, so the GA never gets an accurate estimate of
   their true payoffs, not even in the initial random population.  Of
   course, this is a "needle-in-a-haystack" function, so we don't expect
   the GA to solve it on a regular basis.  But it does satisfy the above
   definition of "GA-easy", so we had better not use that definition.

   The disturbing thing is that most of what passes for GA Theory, going
   back to Bethke's thesis, assumes the SBBH as a starting point.  In other
   words, "GA Theory" deals with algorithms (if there are any) that satisfy
   the SBBH.  Unfortunately, the GA is not one of those algorithms.  Yet
   the theory continues to attract more and more researchers like the lamp
   light attracted the philosopher.  Too bad we lost our keys way over
   there in the dark.

------------------------------

From: Laurent Miclet <miclet@lannion.cnet.fr>
Date: 26 Jun 91 12:12:58+0200
Subject: Pattern Recognition and GAs, info wanted

   My interests are in Pattern Recognition
   (statistical, structural & knowledge-based)
   and of course learning, with applications to speech.
   I have no experience in genetic algorithms,
   except a few tries with the GENESIS software, readings, and
   an increasing feeling that such methods could be a fine tool for
   Pattern Recognition complex problems.
   I would like to know of papers concerning
   the application of GA's to Pattern Recognition, or
   the names of other researchers on that topic.
   Many thanks,

   Laurent Miclet
   Centre National d'Etude des Telecommunications
   Departement Recherche en Communication par la Parole
   BP 40    22301 LANNION Cedex FRANCE
   Tel : 33 96-05-28-93
   Fax : 33 96-05-35-30
   e-mail : miclet@lannion.cnet.fr

------------------------------

From: ntm1169@dsac.dla.mil (Mott Given)
Date: Mon, 8 Jul 91 12:18:09 EDT
Subject: Re: SPLICER

   Evidently, a price has not been set yet on ordering SPLICER, a genetic
   algorithm package that I mentioned recently in this newsletter.  You can
   get it free if you are a government agency or contractor.  If you are not,
   and want to be put onto a mailing list for when the cost information is
   available, please send a request to: service@cossack.cosmic.uga.edu.
   Please do not ask me any questions about it, as I am not an end user of it.

   Mott Given @ Defense Logistics Agency Systems Automation Center,
		DSAC-TMP, Bldg. 27-1, P.O. Box 1605, Columbus, OH 43216-5002
   INTERNET:  mgiven@dsac.dla.mil   UUCP: ...{osu-cis}!dsac!mgiven
   Phone:  614-238-9431  AUTOVON: 850-9431   FAX: 614-238-9928

------------------------------

From: levine@antares.mcs.anl.gov
Date: Mon, 8 Jul 91 15:05:23 CDT
Subject: Re: Large TSP problems; ftp site info

   Regarding the note from Heinz Muehlenbein in GA-List v5n18 regarding large
   TSP problems.  Readers of this list may be interested to know of the
   follwing ftp site. --dave levine

   There exists a publicly available collection of Travelling Salesman
   Problems, partly with optimal solutions, with many examples from the
   literature.  It is available via ftp as follows:


     % ftp titan.rice.edu             /* or, ftp 128.42.1.30 */

     Login Userid : anonymous
	  Password: anonymous

     ftp> cd public
     ftp> type binary
     ftp> get tsplib.tar.Z
     ftp> quit

     % uncompress tsplib.tar.Z
     % tar xvf tsplib.tar

     *********************************
     This collection has been compiled by

			      Gerhard Reinelt
			  Institut fuer Mathematik
			   Universitaet Augsburg

------------------------------

From: giles@research.nj.nec.com	(Lee Giles)
Date: Wed, 17 Jul 91 15:55:45 EDT
Subject: grammatical inference and GAs; request for references

    I would be very interested in locating research that has used Genetic
    Algorithms for the problem of grammatical inference, i.e. finding
    a grammar from samples of grammatical strings.  Any references or
    information would be greatly appreciated.  Thank you very much.

    C. Lee Giles
    NEC Research Institute
    4 Independence Way
    Princeton, NJ 08540
    USA

    Internet:   giles@research.nj.nec.com
	UUCP:   princeton!nec!giles
       PHONE:   (609) 951-2642
	 FAX:   (609) 951-2482

------------------------------
End of Genetic Algorithms Digest
******************************
