Newsgroups: comp.ai.genetic
Path: cantaloupe.srv.cs.cmu.edu!das-news2.harvard.edu!oitnews.harvard.edu!purdue!lerc.nasa.gov!magnus.acs.ohio-state.edu!math.ohio-state.edu!news.cyberstore.ca!skypoint.com!umn.edu!news
From: "James Albert Larson" <larso171@maroon.tc.umn.edu>
Subject: Re: [Q] What is stochastic universal sampling?
To: jkelley@chop.ugcs.caltech.edu
Message-ID: <20035.larso171@maroon.tc.umn.edu>
X-Minuet-Version: Minuet1.0_Beta_16
Sender: news@news.cis.umn.edu (Usenet News Administration)
Nntp-Posting-Host: dialup-2-119.gw.umn.edu
X-Popmail-Charset: English
Organization: University of Minnesota, Twin Cities
Date: Wed, 30 Aug 1995 07:01:12 GMT
Lines: 106

Jonathan,

This is what Michalewicz says about Stochastic Universal Sampling (SUS)
in his book, "Genetic Algoorithms + Data Structures = Evolution 
Programs", 2nd edition, 1994, p. 57:
    
    This method uses a single wheel spin.  This wheel, which is 
    contructed in the standard way (Chapter 2), is spun with a 
    number of equally spaced markers equal to the population size as 
    opposed to a single one.

He indicated it is a "new improved method" (in comparison to the other
selection schemes such as stochastic remainder sampling); and he uses
it as the basis of his modGA GA, which in turn is the basis of other 
algorithms such as Genocop.  (I looked at Genocop2.0, and it doesn't look
like stochastic universal sampling to me, if I recall, it was the basic
SGA style wheel spin for each selection).

Anyway, here is what I think SUS is, by example:
                 
Say the pop size is 5 and fitness values are 10,30,15,35,10
Sum of fitnesses is 10+30+15+35+10 = 100.

Construct a roulette wheel in the traditional manner: individual 1 gets 
10/100 = 10% or the roulette wheel, individual 2 gets 30% of the 
roulette wheel space, etc (fitnesses are shown in the boxes, and below
the boxes at the box boundaries are the cumulative fitnesses normalized 
to sum to 1.0.  Above the boxes is the number of the "individual" 
owning the space):

       1         2         3          4         5
   |  10  |     30     |  15   |     35      | 10  |
   0     .1           .4      .55           .9     1.0

Now the simplest roulette wheel method (not the SUS method) spins 
the wheel 5 times, if the 5 spins produce the 5 random numbers 
.7, .3, .6, .8, .2, then individuals 4,2,4,4,2 are selected.  This
is terrible, an example of stochastic error, where the number of 
times an individual is selected can be far greater than its 
expected number of selections.  The expected number of times
each individual should appear in the new population (or in the mating
pool or whatever) is:

#1: 0.5, #2: 1.5, #3: 0.75, #4: 1.75, #5: 0.5

i.e. the expected number of appearances of indiv #1 in the new popn is 
0.5, the expected number of appearances of indiv #2 in the new popn is 
1.5, etc.

Goldberg has some techniques for greatly reducing stochastic error 
sampling, one that comes to mind is called something like the remainder 
sampling method (I'm doing this from memory).  But he upshot of this 
method is that individual #1 will appear in the new population only 0 
or 1 times (never 2 or more times), individual #2 will appear in the 
new population 1 or 2 times (never 0 times, and never 3 or more times) 
etc.

The Goldberg methods, while fine, and doing the job well, aren't as
elegant as SUS.  In the SUS method you put in 5 (i.e. popsize) equally
spaced markers around but outside the roulette wheel (on the boundary 
part that doesn't spin).  Thus markers are spaced 0.2 units apart (on 
a sum-to-one normalized scale).  You spin the wheel once, by generating 
a random 0 to 1 real number, to set the offset of the first marker.  
Say the random number turned out to be .3.  Then the first marker is 
placed .3*.2 = .06 to the right of the 0 point.  The other markers are 
each 0.2 to the right of the previous marker.  So after the funny single 
wheel spin, the 5 markers end up at .06, .26, .46, .66, and .86, as shown 
below by the "^" marks:

       1         2         3          4         5
   |  10  |     30     |  15   |     35      | 10  |
   0     .1           .4      .55           .9     1.0
       ^        ^         ^        ^       ^         <-markers
      .06      .26       .46      .66     .86 

The result is that #1 is selected once, #2 is selected once, #3 is 
selected 1 time, #4 is selected twice, and #5 is not selected.  This 
meets the criteria that the number of times an individual is selected 
is close (within 1) of the expected number of times the individual is 
selected.

The above was assuming the 100% replacement model -- you want to make
5 selections to replace (well actually the children of the 5 selected
individuals replace) the 5 individuals in the population.  If, on
the other hand, you want to replace 60% of the population, then you 
would do the same as above, but use 3 equally spaced markers
spaced 0.333 apart, and with an offset randomly between 0 and 0.333.

The main drawback of the method is that one needs to know how many 
selections one is going to make in advance.  For example, in the 60%
model we know we WANT to select 3 individuals, whose children are to
replace 3 members of the population.  But what if a child produced is
bad (as occurs in some genetic operations methods in Genocop), or if
a child is identical to some individual already in the population (some
methods, to maintain diversity, require that all individuals in the 
population be unique).  So what could happen is that say 4 selections
had to be made to produce the 3 good children that are desired.
But since we only have 3 equally spaced markers, we must do something
to produce the 4th selection.  Its not a devastating problem to 
overcome, but it makes it less clean and elegant than it appears on
the surface.

Good luck with it.  

Jim Larson

