Newsgroups: comp.ai.genetic
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!news2.near.net!news.mathworks.com!udel!news.sprintlink.net!pipex!uknet!cix.compulink.co.uk!usenet
From: mark_a@cix.compulink.co.uk ("Mark Atkinson")
Subject: Re: [Q] Linear scaling
Message-ID: <D625Du.51y@cix.compulink.co.uk>
Organization: Compulink Information eXchange
References: <3krgd8$okl@obelix.cica.es>
Date: Sun, 26 Mar 1995 17:17:54 GMT
Approved: mitnik@fbi.gov
X-News-Software: Ameol/NT
Lines: 105

gordillo@obelix.cica.es (Francisco Gordillo Alvarez) writes:

>   I have a doubt concerning fitness scaling:

You're not the only one.


> In Goldberg's book "Genetic Algorithms in Search, Optimization and
> Machine Learning", page 77, he says:

IMHO, it is rather unfortunate that Goldberg's canonical introductory 
text promulgates the idea of Roulette Wheel selection and Fitness Scaling 
as being good ideas.


>   [ speaking about linear scaling f'= a f + b ]
>      
> > "The coefficients a and b may be chosen in a number of ways; however, 
> > in all cases we want the average scaled fitness f'_{avg} to be equal
> > to the average raw fitness f_{avg} because subsequent rule selection
> > procedure will insure that each average population member contributes 
> > one expected offspring to the next generation"
>       
>  ** WHY? If we do all the calculations regarding rule selection with 
> the scaled fitness the last part of the sentence can be fulfilled 
> without imposing f'_{avg]=f_{avg}, because with linear scaling, members 
> with raw finess equal to f_{avg} will ALWAYS have a scaled fitness 
> equal to f'_{avg}.


The problems here stem from the basic approach, and assumptions, of 
roulette wheel selection.

Suppose we have 2 members of a population, m1 and m2, and a fitness 
function f(), and that f(m1)=10 and f(m2)=20.

What can we say about m1 and m2?  With roulette wheel selection, we say 
that m2 is twice as fit as m1, and should have double the chance to 
reproduce.  The basic premise is that the fitness function is linear, 
passes through the origin, and is invariant over the lifetime of the 
population.

On anything but the simplest toy functions, these assumptions are bogus, 
and basic RW selection leads to associated weaknesses.  One typical 
example is that if early on in the run a very fit individual arises by 
chance, the population will rapidly converge around this individual, 
losing diversity, and missing possible better solutions after the run has 
had a chance to get underway.

Enter "fitness scaling".  Fitness scaling is basically a hack added onto 
RW selection to try and compensate for these problems, by making a 
different set of bogus assumptions about what the fitness function means, 
and how it varies over time.

While RW selection and FS can work on test functions, if you know enough 
about the behaviour of the fitness function to make the necessary 
assumptions required, it is likely you would achieve better results by 
using a more direct, if less robust, optimisation technique.

The GA is the optimser of choice precisely when you don't (or can't) know 
enough about the problem to use a more specific technique.  The GA will 
search and recombine correlations and non-uniformities in the search 
space , using only the simplest and most eloquent of operations, without 
any domain-specific knowledge.


So what to do?  Well what _can_ we say about f()?  Our only real safe 
assumption about f() in general, is that if f(m1) > f(m2), then m1 is 
fitter than m2 (minimally, "more often than not").

To this end, the selection method of choice, where strong assumptions 
about the nature of the fitness function cannot be made (ie almost 
always), is rank-based selection.

While there are several variants on implementing ranking, in general the 
procedure is to sort the population by fitness value, and use the 
ordering as the basis for reproduction, irrespective of the actual value 
of f(), which can be rather arbitrary, and implementation dependent.

[Implementation note: you do not need to actually sort the entire 
population, which is rather slow (O(n log n), you can use a linear-time 
partitioning algorithm, which delivers equivalent results (remember this 
is a stochastic process) in much less time.]

For implementations, you may want to look at:

GENEsYs, Thomas Baeck <baeck@ls11.informatik.uni-dortmund.de>
Genitor, Darrell Whitley <whitley@cs.colostate.edu>

GENEsYs is newer and more fully featured, BTW.  I've also found Whitley's 
papers to be good value.

See the comp.ai.genetic FAQ for more details.




-=Mark=-


==========================================================================
mark_a@cix.compulink.co.uk (Mark Atkinson)
"Today's Competition: Define 'Life' and 'Intelligence' using only Boolean
logic, in a thread of 20,000 posts or less."
==========================================================================
