		Welcome to the wonderful world of Geppetto!  

Geppetto is an environment for creating and evolving programs
semi-automatically.  Using its own variables (and data types), along with
operators supplied by you, Geppetto uses genetic and Darwinian operations
to evolve randomly created parse trees into programs which solve the task
you've selected.  It comes with a simple driver which takes care of some
of the messy details and allows details of a single run to be customized
from the command line.

This user's guide assumes that you're familiar with Genetic Programming,
either from John R. Koza's "Genetic Programming: On the Programming of
Computers by Means of Natural Selection" (MIT Press, ISBN 0-262-11170-5)
or from the numerous papers available on ftp.cc.utexas.edu in
/pub/genetic-programming/papers.


		A Sample Application: Symbolic Regression

This application, described in Appendix B of Genetic Programming, involves
discovering the polynomial function (X*X)/2.  We'll walk through the process
of building an application in roughly the same order used by Koza.


Step 1: The Set of Terminals

The first major step defined by Koza is to identify the set of 
terminals. For this problem, we obviously need a variable and a source for 
a bunch of floating point numbers.

We start by defining a C variable:

	float x;

A Geppetto variable can then be created from this with the variableCreate() 
procedure. This procedure takes three arguments (a 'datatype', a 'char' 
pointer to the variable's name, and a pointer to the actual C variable) and 
returns a pointer to a 'variable' object. The code to create a Geppetto 
variable is: 

	variableCreate(dtFloat, "X", &x);

For this example, the Geppetto variable name is the same as the C variable
name. It could just as easily have been "foo" or "MyVariable".  If you're
planning on using checkpoints, you should avoid spaces in the variable name.

Note also that the variableCreate() procedure can be used to define a 
variable for any of the data types supported by Geppetto. All the basic C
types are supported, plus a few extra types.  The Geppetto data types
and their C equivalents are:

	dtVoid		void
	dtBoolean	bool		(Provided via a 'typedef')
	dtShort		short
	dtInteger	int
	dtLong		long
	dtFloat		float
	dtDouble	double
	dtError		errorCode	(Provided via a 'typedef')
	dtList		resultList *	(Provided via a Geppetto structure)
	dtBlob		blob *		(Explained below)

To create an integer variable, simply define a C variable (with 
something like 'int i;') and use it to create a Geppetto variable: 

	variableCreate(dtInteger, "i", &i);

Geppetto programs use 'constantSrc' objects to supply random constants. To 
create a source of floating-point constants, we use the floatSrcCreate() 
procedure, which is a friendlier interface to the more versatile 
constantSrcCreate() procedure. floatSrcCreate() takes two arguments, the low 
end of the random number range and the high end. The following code creates 
an object which generates random floating-point numbers between -5.0 and 
5.0: 

	floatSrcCreate(-5.0, 5.0);

Setting both arguments to 0 will produce the full range of random variables
(every positive integer or a floating-point number between 0.0 and 1.0)

Both of these terminals need to be put into a list. Geppetto supplies a set 
of 'objectList' procedures for this purpose. We first create a list which 
will hold two objects: 

	objectList *list;
	
	list = objectListCreate(2);

and then add the two constants to the list with the objectListAdd() 
procedure, which takes two arguments, a pointer to the 'objectList' and a 
pointer to the 'object' to add to the list. Every Geppetto object can be 
cast to and from this 'object' type. 

Since we're using the simple driver provided with Geppetto, we need to 
package this all up in an appTerminals() procedure. The finished procedure 
looks like this: 

	objectList *
	srTerminals()
	{
		objectList *list;
	
		list = objectListCreate(2);
		objectListAdd(list, variableCreate(dtFloat, "X", &x));
		objectListAdd(list, floatSrcCreate(-5.0, 5.0));
		return(list);
	}


Step 2: The Set of Functions

The second major step involves defining the function set for the problem. 
For this problem, the function set consists of the four arithmetic 
operators. 

Since Koza uses Lisp, he can use the actual Lisp '+', '-' and '*' 
operators, though he has to provide a protected division operator. Since 
Geppetto uses a compiled language, we must write code for these operators. 

Every object in a Geppetto program, when evaluated, returns a pointer to a 
'result' object. The simplest form of Geppetto operator takes as arguments 
a list of these pointers and a pointer to an application-specific structure 
(which can be ignored for this problem.) 

Here's how '+' would be written as a Geppetto operator:

	result *
	opAdd(argv, envp)
	const result **argv;
	void *envp;
	{
		float fval;
	
		fval = resultFloat(argv[0]) + resultFloat(argv[1]);
		return(resultCreate(dtFloat, fval));
	}

This procedure converts its two arguments to floating-point values using 
resultInteger(), adds them together and stores this new value in 'fval'. It 
then creates a new floating-point 'result' from 'fval' using 
resultCreate(). The intermediate value could be eliminated to create an 
even more terse version of the operator: 

	result * 
	opAdd(argv, envp)
	const result **argv;
	void *envp;
	{ 
		return(resultCreate(dtFloat, resultFloat(argv[0]) + 
				resultFloat(argv[1])));
	}

The '-' and '*' operators are almost exactly the same as opAdd(), differing 
only in the name and operation performed. Division, however, must be 
protected just as in the Lisp version.  Here's the code to implement 
division:

	result * 
	opDivide(argv, envp)
	const result **argv;
	void *envp;
	{
		float fval;
	
		if (resultFloat(argv[1]) == 0)
			fval = 1;
		else
			fval = resultFloat(argv[0]) / resultFloat(argv[1]);
		return(resultCreate(dtFloat, fval));
	}

The newly created operators now have to be added to the function list.  
As in the terminal list, we'll create an 'objectList' and add four 
'operator' objects to it.  These objects are created with the 
simpleOperatorSrcCreate() procedure, which takes as arguments a pointer to
the name which Geppetto will use to refer to the operator, a pointer to the
operator and the number of arguments to the operator.  Here's the function
list code:

	objectList *
	srFunctions()
	{
		objectList *list;
	
		list = objectListCreate(4);
		objectListAdd(list,
			      simpleOperatorSrcCreate("+", opAdd, 2));
		objectListAdd(list,
			      simpleOperatorSrcCreate("-", opSubtract, 2));
		objectListAdd(list,
			      simpleOperatorSrcCreate("*", opMultiply, 2));
		objectListAdd(list,
			      simpleOperatorSrcCreate("/", opDivide, 2));
		return(list);
	}


Step 3: The Fitness Measure

The symbolic regression problem uses ten points regularly spaced 
between 0.0 and 1.0 as its measure of fitness.

These numbers need to be stored somewhere, so we'll create an array to 
hold both the fitness cases and the correct answers, using a C structure:

	#define NUMBER_OF_FITNESS_CASES	10
	struct {
		float x;
		float answer;
	} fitnessCase[NUMBER_OF_FITNESS_CASES];

The actual array initialization code could be done like this:

	int i;
	
	/* initialize fitness cases */
	for (i = 0; i < NUMBER_OF_FITNESS_CASES; i++) {
		x = i / NUMBER_OF_FITNESS_CASES;
		fitnessCase[i].answer = (x*x)/2;
	}

(The 'x' used in this array initialization code is the floating-point variable
'x' declared back in step 1.)

At a minimum, an application should provide two fitness-related procedures: 
one to set up each fitness case for every program evaluated and a second to 
rate the 'result' obtained by evaluating each program.

The first procedure is fairly trivial to write:

	void *
	srCaseInitialize(popNum, fc)
	int popNum;
	int fc;
	{
		x = fitnessCase[fc].x;
		return(0);
	}

This procedure is passed the index for this population (which will always
be zero unless your application is using coevolution) as well as the number
of the fitness case to set up.  It sets our C variable to the appropriate
value.  Since this application isn't complex enough to need an application-
specific environment, it passes back a null pointer.

The second fitness-related procedure is a bit more complicated.  It accepts
six parameters: a pointer to the 'result' returned by the current program,
the number of the fitness case which the program just evaluated, pointers 
to the hits, raw fitness and standardized fitness variables for this 
program and a pointer to the application-specific environment.  The bare
bones of the second procedure look like this:

	void
	srCaseFitness(rp, fc, hitp, rawp, stdp, envp)
	result *rp;
	int fc;
	int *hitp;
	double *rawp;
	double *stdp;
	void *envp;
	{
	}

A program scores a hit in the symbolic regression problem if it's within 
.01 of the correct answer.  For this procedure, we'll use resultFloat() to
get the floating-point value from the 'result', subtract it from the correct
answer in the fitnessCase array and take the absolute value of that using
the fabs() function.  This value can then be used to see if we got a hit:

	float diff;
	
	/* compute difference between program result and actual answer */
	diff = fabs(resultFloat(rp) - fitnessCase[fc].answer);

	/* see if we got a hit */
	if (diff < 0.01)
		*hitp += 1;

The raw fitness for this problem is the sum of all of the differences,
so we'll add the value computed above:

	/* set raw fitness */
	*rawp += diff;

We'll also want to set the standardized fitness after the final fitness
case.  For this problem, standardized fitness is the same as raw fitness:

	/* set standardized fitness after final fitness case */
	if (fc == NUMBER_OF_FITNESS_CASES-1)
		*stdp = *rawp;

The finished case fitness routine looks like this:

	void
	srCaseFitness(rp, fc, hitp, rawp, stdp, envp)
	result *rp;
	int fc;
	int *hitp;
	double *rawp;
	double *stdp;
	void *envp;
	{
		float diff;
	
		/* compute difference between result and answer */
		diff = fabs(resultFloat(rp) - fitnessCase[fc].answer);
	
		/* see if we got a hit */
		if (diff < 0.01)
			*hitp += 1;
	
		/* set raw fitness */
		*rawp += diff;
	
		/* set standardized fitness after final fitness case */
		if (fc == NUMBER_OF_FITNESS_CASES-1)
			*stdp = *rawp;
	}

Everything else regarding fitness is automatically taken care of by Geppetto.


Step 4: The Criterion for Designating a Result and Terminating a Run

(Koza does step 5 before step 4, but it's more natural in Geppetto
to do them in this order.)

Geppetto will automatically terminate a run after the specified number of 
generations.  You might, however, wish to terminate the run when the best 
program for a generation meets certain criteria.

For the symbolic regression problem, we'll terminate the run if a program 
has a hit for every fitness case:

	int
	srTerminateRun(popNum, hits, raw, std)
	int popNum;
	int hits;
	double raw, std;
	{
		return(hits == NUMBER_OF_FITNESS_CASES);
	}


Step 5: The Parameters and Variables for Controlling the Run

At a minimum, you'll need to tell Geppetto the number of fitness cases for
your application and point it at the application's terminal and function
lists.  You'll also want to let Geppetto know how to evaluate the fitness of
each case.  You'll also probably want to tell it how to set up each fitness
case and whether or not you want to prematurely terminate the run.

All of these things are done using the 'population' object, which holds
all the programs and the information on how to manipulate them.

There's also a 'global' object which holds information like the maximum
number of generations, the random number seed, and other global information.

The simple driver expects your program to contain an application-specific
initialization procedure name appInitialize(), to which it passes a 'global'
object pointer, a 'population' object pointer and on integer.  You can use
these variables to set up the defaults for your application.

Here's what the initialization function for this application would look like:

	void
	appInitialize(gp, pop, popNum)
	void *gp;
	population *pop;
	int popNum;
	{
		int i;
		objectList *tList, *fList;
	
		/* initialize fitness cases */
		for (i = 0; i < NUMBER_OF_FITNESS_CASES; i++) {
			x = (float )i / NUMBER_OF_FITNESS_CASES;
			fitnessCase[i].x = x;
			fitnessCase[i].answer = (x*x)/2;
		}
	
		/* set global variables */
		globalSetGenerations(gp, 30);
	
		/* build terminal and function lists */
		tList = srTerminals();
		fList = srFunctions();
	
		/* set app-specific variables */
		populationSetSize(pop, 200);
		populationSetFitnessCases(pop, NUMBER_OF_FITNESS_CASES);
		populationSetTerminalList(pop, tList);
		populationSetFunctionList(pop, fList);

		/* set app-specific functions */
		populationCaseInitializeFunc(pop, srCaseInitialize);
		populationCaseFitnessFunc(pop, srCaseFitness);
		populationTerminateRunFunc(pop, srTerminateRun);
	}

The first block of code is the fitness case initialization described above.

The second block tells the 'global' object to only run for a maximum of 30
generations.

The third block creates the terminal and function lists using the
procedures written in steps 1 and 2.

The fourth block tells the 'population' object to only create a population
of 200 programs.  It also tells it how many fitness cases this application
uses and points it at the terminal and function lists.

The final block points the 'population' object at the procedures we created
in step 3.

If you're using the simple driver and a sensible operating system, most of
the other parameters and variables can be set at run-time using the
command-line arguments documented below.

The full source code listing for this symbolic regression problem can be
found in sr.c.  This directory also contains a Makefile, which will build
both 'sr' and a debugging version 'debugsr' (explained below).

There are a couple of differences between the code described above and the
code in sr.c.

First, there are a couple of #include lines at the top of the file.
Just as any program which uses any floating-point function must include the
<math.h> file, Geppetto applications using the simple driver should include
"geppetto.h".  This file then #includes everything needed by most Geppetto
applications.

There's also some code between an '#ifdef DEBUG' and an '#endif' statement
at the bottom of the file.  This is an even simpler driver routine that, when
compiled with -DDEBUG and run, will evaluate the "(/ (* X X) 2)" program.
You can also enter programs at the command-line (make sure to enclose each
separate program in quotes).  To see how "(* X X)" would do, run:

	./debugsr "(* X X)"

You can enter as many of these as you'd like (or as many as your shell will
allow).  This is handy for making sure your operators are really doing what
you think they're doing.


			More on the 'population' object

As was mentioned before, the 'population' object can be used to customize
the way Geppetto executes your application.  Here are the procedures
provided to set the functions called by this object:

	void populationSetCaseInitializeFunc(population *pop, funcptr)
		points the 'population' object at the procedure called to
		set up each fitness case.  The bare bones of this procedure
		look like this:

			void *
			myCaseInit(popNum, caseNum)
			int popNum;	/* index for this population */
			int caseNum;	/* index for this fitness case */
			{
			}

		The returned value is an application-defined structure
		which passed to each operator and to the next two procedures
		(if they're defined).

		If you don't set this pointer, it will be ignored and
		a NULL pointer will be passed to the operators and the
		next two procedures.

	void populationSetCaseTerminateFunc(population *pop, funcptr)
		points the 'population' object at the procedure called each
		time the program is executed.  The bare bones of this
		procedure look like this:

			int
			myCaseTerminate(rp, envp, caseNum)
			result *rp;	/* result returned during this loop */
			void *envp;	/* pointer returned by 'myCaseInit' */
			int caseNum;
			{
			}

		If this procedure returns 0, execution is terminated and
		the case fitness function is called.  Otherwise, the program
		is repeatedly evaluated until the maximum number of loops
		has been reached or an error is returned as a 'result'.

		If you don't set this pointer, each program will only be
		evaluated once a generation.

	void populationSetCaseFitnessFunc(population *pop, funcptr)
		points the 'population' object at the procedure called to
		analyze the 'result' returned by the program for each
		fitness case.  The bare bones of this procedure look like
		this:

			void
			myCaseFitness(rp, caseNum, hitp, rawp, stdp, envp)
			result *rp;	/* result returned by the program */
			int caseNum;	/* index for this fitness case */
			int *hitp;	/* pointer to program's hit counter */
			int *rawp;	/* pointer to program's raw fitness */
			int *stdp;	/* pointer to program's std fitness */
			void *envp;	/* pointer returned by 'myCaseInit' */
			{
			}

		If you don't set this pointer, no fitness values will be set
		and your application becomes essentially a random search.

	void populationSetEvalCleanupFunc(population *pop, funcptr)
		points the 'population' object at the procedure called to
		do any clean-up required after all the programs in this
		generation have been evaluated.  The bare bones of this
		procedure look like this:

			void
			myCleanup(envp)
			void *envp;
			{
			}

		If you don't set this pointer, it will be ignored.

	void populationSetTerminateRunFunc(population *pop, funcptr)
		points the 'population' object at the procedure called to
		see if this run may be terminated early.  The bare bones of
		this procedure look like this:

			int
			myTerminateRun(popNum, hits, rawFitness, stdFitness)
			int popNum;
			int hits;
			double rawFitness;
			double stdFitness;
			{
			}

		The values passed are from the program to which your case
		fitness function awarded the best standardized fitness for
		this generation.

		If you don't set this pointer, it will be ignored and the
		process will continue to the maximum number of generations.

	void populationSetDestructorFunc(population *pop, funcptr)
		points the 'population' object at the procedure called to
		clean up before the application is terminated and execution
		ceases.  The bare bones of this procedure look like:

			void 
			myDestructor()
			{
			}

		If you don't set this pointer, it will be ignored.

There are three variables which *must* be set for Geppetto to work.
They are:

	void populationSetFitnessCases(population *pop, int cases)
		sets the number of times the program is run and your case
		fitness routine is called

	void populationSetTerminalList(population *pop, objectList *tl)
		sets the terminal list for your application

	void populationSetFunctionList(population *pop, objectList *fl)
		sets the function list for your application

There are also a number of variables which are set to reasonable defaults
but may be changed using the following procedures:

	void populationSetAlwaysEvaluateMode(population *pop)
		turns on 'alwaysEvaluate' mode.  If a program is an exact
		copy of a program from a previous generation, it will
		produce the same results (given the same inputs) and thus
		finish with the same number of hits and standardized
		fitness.  If Geppetto is in 'alwaysEvaluate' mode, it will
		not evaluate these exact copies.

	void populationClearAlwaysEvaluateMode(population *pop)
		turns off 'alwaysEvaluate' mode.  This is the default.

	void populationSetBreedPercentage(population *pop,
	  programBreedType pbt, double pct)
		sets the percentage chance that the specified breeding
		method will be used to create program(s) for the next
		generation.  'pbt' can be any of pbtReproduction,
		pbtXoverInternal, pbtXoverTerminal, pbtXoverAny or
		pbtMutation.

		By default, pbtReproduction is set to 0.095,
		pbtXoverInternal is 0.695 and pbtXoverAny is set to
		0.195.  This only adds up to 0.985, so the remaining
		0.015 is assigned to the last empty percentage on the
		list (currently pbtMutation).

	void populationSetInitialTreeDepth(population *pop, int d)
		sets the maximum depth for programs in the initial
		population.  It defaults to 6 nodes and is increased
		as necessary in order to ensure that every member of
		the initial population is unique.

	void populationMaximumLoops(population *pop, int l)
		sets the maximum number of times a single program will
		be evaluated in a single generation and, by default,
		is set to 1000.

	void populationMaximumMutationDepth(population *pop, int d)
		sets the maximum depth for any subtree created during
		the mutation operation.  The default is 4 nodes.

	void populationMaximumTreeDepth(population *pop, int d)
		sets the maximum depth for any program.  The default is
		17 nodes.

	void populationSetOverselectionPercentage(population *pop, double p)
		if the parent selection method is set to
		psmOverselectionPercentage, this value will be used to
		determine the percentage above which programs have a
		much greater chance of being selected.

	void populationSetParentSelectionMethod(population *pop,
	  parentSelectionMethod psm)
		sets the method for selecting the parent(s) of a new
		program to one of psmFitnessProportionate,
		psmGreedyOverselection or psmTournament.  The default
		is psmFitnessProportionate.

	void populationSetParsimony(population *pop, double p)
		sets the parsimony value which, if not equal to 0.0, is
		multiplied by the total number of nodes in a program and
		added to its standardized fitness.  Large numbers
		encourage smaller programs, negative numbers encourage
		HUGE programs.  This is 0.0 by default.

	void populationSetSize(population *pop, int p)
		sets the number of programs in a population.  This is
		initially 500.

	void populationSetProgramCreationMethod(population *pop,
	  programCreationMethod pcm)
		sets the method for creating programs to the method
		specified by 'pcm', which must be either pcmFull (for
		fully filled parse trees), pcmGrow (for trees with random
		shapes) or pcmRamped (which uses Koza's "ramped half-and-
		half" method).  pcmRamped is the default.

	void populationSetReturnTypes(population *pop, datatype mask)
		sets the list of data types which a valid program can
		return.  This is explained more fully below.

	void populationSetTournamentRounds(population *pop, int r)
		if the parent selection method is set to psmTournament,
		this will be used to determine the number of programs
		which will be compared to determine a winner.  It
		defaults to 3 rounds.


			More on the 'global' object

The 'global' object contains information which isn't really population-
specific.  Here are the procedures provided to set the functions called by
this object:

	void globalSetCheckpointFrequency(global *gp, int f)
		sets the number of generations which will pass before
		a new checkpoint file is written.  By default, this is 0,
		meaning no checkpoint file will be written.

	void globalSetCheckpointName(global *gp, char *n)
		sets the name of the file to which checkpoint information
		will be written.

	void globalSetCountUniqueMode(global *gp)
		turns on 'countUnique' mode.  If this mode is turned on,
		Geppetto will gather and print statistics about the
		number of unique programs in each generation.  It's a
		useful metric but time-consuming to gather.

	void globalClearCountUniqueMode(global *gp)
		turns off 'countUnique' mode.  This is the default.

	void globalSetGenerations(global *gp, int g)
		sets the total number of generations, which is initially
		set to 50.

	void globalSetRandomNumberSeed(global *gp, int s)
		sets the random number seed. By default, it's set to the
		value returned by the time() function 'exclusive-or'ed
		with the value returned by getpid(), and thus is almost
		certainly unique for consecutive runs.

	void globalSetVerboseMode(global *gp)
		turns on 'verbose' mode, which causes Geppetto to print
		EVERY program and its associated statistics before the
		usual generation statistics.

	void globalClearVerboseMode(global *gp)
		turns off 'verbose' mode.  This is the default.


			The Simple Geppetto Driver

The 'main.o' simple driver contains a bewildering number of command-line
arguments which allow you to customize each run.  Here's a brief description
of each:

	-a	turns on 'alwaysEvaluate' mode

	-A	turns off 'alwaysEvaluate' mode

	-c	sets the creation method.  Specifying '-c full' ('-c f'
		will also work) sets it to pcmFull, '-c grow' sets it to
		pcmGrow, and '-c ramped' sets it to pcmRamped.

	-d	sets the maximum tree depth

	-f	sets the name of the checkpoint file

	-F	reads information from this checkpoint file.  Any arguments
		set prior to this will be replaced by the values in this
		file and any arguments set after this will override values
		set in the checkpoint file.

	-g	sets the maximum number of generations.

	-i	sets the initial tree depth.

	-m	sets the maximum size of a subtree created during the
		mutation operation.

	-o	sets the percentage subject to Greedy Overselection.

	-p	sets the population size.

	-r	sets the random number seed.

	-s	sets the selection method.  Specifying '-s f' sets this
		to psmFitnessProportionate, '-c g' specifies
		psmGreedyOverselection and '-c t' specifies psmTournament.

	-t	sets the number of rounds in Tournament selection.

	-u	turns on 'countUnique' mode.

	-U	turns off 'countUnique' mode.

	-v	turns on 'verbose' mode.

	-V	turns off 'verbose' mode.

	-x	sets the checkpoint frequency.

	-z	sets the parsimony value.



			Data Types in Geppetto

As was mentioned above, Geppetto supports typed constants and variables.
The basic ones are implemented using the standard C types and thus the
usual rules hold:
  * dtShort is smaller than (or possibly the same size as) dtInteger,
    which is shorter than (or possibly the same size as) dtLong
  * dtFloat is single-precision floating-point while dtDouble is
    double-precision floating-point.

Geppetto supports boolean values, which are either TRUE (0) or FALSE
(non-zero), using 'dtBoolean' inside Geppetto objects and 'bool' as the
C types (just as 'int' is the C type for 'dtInteger'.)

There is also the 'dtList' type, which holds a list of 'result's.  Its
C type is 'resultList *'.  There are a few procedures to manipulate
objects of this type:

	resultList *resultListCreate(int len)
		creates a list and preallocates 'len' elements

	int resultListAdd(resultList *list, result *elem)
		adds 'elem' to 'list', returns 0 if 'elem' was added

	int resultListLength(resultList *list)
		returns the number of elements in 'list'

	result *resultListEntry(resultList *list, int i)
		returns element 'i' from the list

	void resultListFree(resultList *list)
		frees memory used by 'list'

If you need a different datatype, there's the Blob type.  There are
placeholders in the Geppetto library which can be replaced by your own
routines if you need some specialized data type.  Routines which need to be
replaced:

	blob *blobCreate(datatype dtype)
		creates a new blob of memory, of type 'dtype'

	blob *blobCopy(blob *bbp, datatype dtype)
		creates an exact copy of the 'dtype' blob pointed to by 'bbp'

	int blobCompare(blob *bbp1, blob *bbp2, datatype dtype)
		returns 1 if 'bbp1' and 'bbp2' are identical, 0 otherwise

	int blobToString(blob *bbp, datatype dtype, charString *cstr)
		converts a blob of 'dtype' memory to a text representation
		which can be written to a checkpoint file

	blob *blobParse(const char **cpp, datatype *dtp)
		converts a pointer to a character string (*cpp) to a blob
		of memory, updating *cpp to point to the character *after*
		the string representing the blob and updating dtp to the
		correct datatype.

	void blobFree(blob *bbp, datatype dtype)
		frees blob of 'dtype' memory pointed to by 'bbp'


		Manipulating 'result' objects in Geppetto

'result' structures can contain any of the above data types.  Here are
the procedures provided by Geppetto to manipulate them:

	datatype objectDataType(result *rp)
		returns the data type of this 'result' (and can be
		used on any object)

	bool resultIsVoid(result *rp)
	bool resultIsBoolean(result *rp)
	bool resultIsShort(result *rp)
	bool resultIsInteger(result *rp)
	bool resultIsLong(result *rp)
	bool resultIsFloat(result *rp)
	bool resultIsDouble(result *rp)
	bool resultIsList(result *rp)
	bool resultIsBlobPtr(result *rp)
	bool resultIsError(result *rp)
		All these procedures return TRUE if the result is of the
		appropriate type, FALSE otherwise.

	bool resultSameType(result *r1, result *r2)
		returns TRUE if both results have the same type

	bool resultBoolean(result *rp)
	short resultShort(result *rp)
	integer resultInteger(result *rp)
	long resultLong(result *rp)
	float resultFloat(result *rp)
	double resultDouble(result *rp)
	resultList *resultListPtr(result *rp)
	blob *resultBlobPtr(result *rp)
		All these procedures return the requested value from the
		'result'.  Be careful when using them, because they don't
		do any type checking and will happily return the first
		two or four bytes of a 'double' as a 'short' if that's what
		you ask for.

If you're doing GP the way it's done in Koza's book, you'll only ever use a
single type and you won't need to worry about how you use that last group
of procedures.  If you mix even two different types, you need to carefully
check the data type of each 'result' before trying to grab its value.

	void resultSetVoid(result *rp)
		Change this 'result' to a Void result

	void resultSetBoolean(result *rp, bool val)
	void resultSetShort(result *rp, short val)
	void resultSetInteger(result *rp, int val)
	void resultSetLong(result *rp, long val)
	void resultSetFloat(result *rp, float val)
	void resultSetDouble(result *rp, double val)
	void resultSetListPtr(result *rp, resultList *valp)
	void resultSetBlobPtr(result *rp, blob *valp, datatype dtype)
		These procedures all change the 'result' pointed to by
		"rp" to the appropriate	'datatype' and assign the
		value to it.  You should NEVER use these functions with
		a ListPtr or BlobPtr result, since the list or blob
		will not be freed and you'll have a potentially
		disastrous memory leak.
		


			The 'Error' data type

Koza's protected division operator returns 1 if a divide-by-zero error
occurs.  This works passably for research, but if you'd like to evolve
working programs, it'd be nice to eliminate these incorrect programs
when possible.  To do this, we can return an error:

	result * 
	opDivide(argv, envp)
	const result **argv;
	void *envp;
	{
		float fval;
	
		/* bomb on divide-by-zero errors */
		if (resultFloat(argv[1]) == 0)
			return(resultCreate(dtError, ErrorDivideByZero));

		return(resultCreate(dtFloat, resultFloat(argv[0]) /
							resultFloat(argv[1])));
	}

If a simple operator like opDivide() returns an error, Geppetto stops
evaluating the program and returns the error as the program's 'result'.
This means that you don't need to worry about receiving an Error in your
simple operators.

We will need to check for errors in our CaseFitness() routine.  I usually
give erroneous programs a high standardized fitness to emphasize their
unfitness.  Here's a new version of the symbolic regression case fitness
routine which penalizes erroneous programs:

	void
	srCaseFitness(rp, fc, hitp, rawp, stdp, envp)
	result *rp;
	int fc;
	int *hitp;
	double *rawp;
	double *stdp;
	void *envp;
	{
		float diff;
	
		if (resultIsError(rp) || *stdp == HUGE_VAL) {
	
			/* set standardized fitness to a big number */
			*stdp = HUGE_VAL;
		} else {

			/* compute difference between result and answer */
			diff = fabs(resultFloat(rp) - fitnessCase[fc].answer);
	
			/* see if we got a hit */
			if (diff < 0.01)
				*hitp += 1;
	
			/* set raw fitness */
			*rawp += diff;
		}
		
		/* set standardized fitness after final fitness case */
		if (fc == NUMBER_OF_FITNESS_CASES-1)
			*stdp = *rawp;
	}

This sets the standardized fitness to the largest floating-point number
possible if there is ever an error.  (Since standardized fitness is as
big as it can ever be, it also needs to make sure that no more values
get added to the standardized fitness.)

The code in opDivide() won't compile as written, because 'ErrorDivideByZero'
isn't defined.  Currently, Geppetto only knows about a single errorCode,
ErrorBadDataType, which is used internally.  You can use this everywhere
if you don't care about anything but the fact that there was an error.

If you want to be able to distinguish the different errors, you can
define your own errors.

To create a new error, add lines like these somewhere at the top of your
program:

	#define ErrorDivideByZero	ErrorUserDefined+0
	const char *MsgDivideByZero =	"Divide by Zero;

The second error message would be 'ErrorUserDefined+1' and so on.

In your appInitialize() function, you then add this line:

	errorCodeSetMessage(ErrorDivideByZero, MsgDivideByZero);

This allows Geppetto to use the string when describing the error.


			Complex Operators

You may have wondered why I keep saying "simple operator".  That's because
there are two different types of operators, simple and complex.

A simple operator, as we've already seen, is given a list of pointers to
'result's and a pointer to the application-specific data, does something
with them, and returns a new 'result'.

A complex operator looks a great deal like main() in a Unix C program.
There are three arguments, an argument count, a list of pointers and a
pointer to the application-specific data.  The complex operator's list
of pointers is a list of 'object's rather than a list of 'result's.

'object's are Geppetto's internal representation of a program and need
to be evaluated using objectEval() in order to get a pointer to a 'result'.
Thus a complex opAdd() might look like this:

	result *
	opAdd(argc, argv, envp)
	int argc;
	object *argv;
	void *envp;
	{
		result *r0, *r1;
		result *answer;
	
		/* evaluate args, abort on error */
		r0 = objectEval(argv[0], envp);
		if (resultIsError(r1))
			return(r1);
		r1 = objectEval(argv[1], envp);
		if (resultIsError(r1)) {
			resultFree(r0);
			return(r1);
		}
	
		/* get the answer */
		answer = resultCreate(dtFloat, resultFloat(r0) + 
				      resultFloat(r1));

		/* clean up and return */
		resultFree(r0);
		resultFree(r1);
		return(answer);
	}

The first thing you probably noticed about this complex operator is that
we've got to do all our own memory deallocation using the resultFree()
procedure.  Since operators are evaluated hundreds or thousands of times
in a single generation, a memory leak here can be disastrous.

We also need to check for errors if we're using them in other operators.
It's obviously MUCH more convenient (and probably safer) to implement
opAdd() as a simple operator.

A better example of a complex operator is the IfGreaterThan operator:

	result *
	opIfGreaterThan(argc, argc, envp)
	int argc;
	object *argv;
	void *envp;
	{
		result *r0, *r1, *answer;
		bool cond;
	
		/* evaluate the first two arguments */
		r0 = objectEval(argv[0], envp);
		if (resultIsError(r0))
			return(r0);
		r1 = objectEval(argv[1], envp);
		if (resultIsError(r1)) {
			resultFree(r0);
			return(r1);
		}
	
		/* compare the first two results */
		cond = resultFloat(r0) > resultFloat(r1);
		resultFree(r0);
		resultFree(r1);

		/* evaluate the appropriate branch */
		if (cond)
			return(objectEval(argv[2], envp));
		else
			return(objectEval(argv[3], envp));
	}

This still has all the memory management problems but the last four lines
are the key to why you'd want to use complex operators.  Complex operators
don't get their arguments handed to them already evaluated, so they can
evaluate only when absolutely necessary.  The IfGreaterThan operator only
evaluates one of the last two subtrees.

Just as we created a source for simple operators with the
simpleOperatorSrcCreate() procedure, we create a source for complex
operators with complexOperatorSrcCreate().  Here's what the opIfGreaterThan
source looks like:

	operatorSrc *osp;

	osp = complexOperatorSrcCreate("ifGreaterThan", opIfGreaterThan,
				       dtAll, 4, 4,
				       dtAll, dtAll, dtAll, dtAll);

Just as before, we need to give Geppetto a char string to refer to the
operator and a pointer to the operator.  For complex operators, we
also need to tell Geppetto all the possible types that can be returned,
the minimum and maximum number of arguments to this operator, and the
data types of each argument.

(The 'dtAll' datatype is a wildcard matching every datatype.)


		Variable Number of Arguments

Specifying the minimum and maximum number of arguments means that a single
operator can support multiple argument lengths.  For instance, rather than
having a List2 and List3 type as Koza does in one of his examples, we
could write a generic opList() operator:

	result *
	opList(argc, argv, envp)
	int argc;
	object *argv;
	void *envp;
	{
		int i;
		result *rp = 0;
	
		for (i = 0; i < argc; i++) {

			/* free previous result */
			if (rp)
				resultFree(rp);

			/* evaluate this argument */
			rp = objectEval(argv[i], envp);
		}

		/* return final result */
		return(rp);
	}

We then create a source with a variable number of arguments:

	osp = complexOperatorSrcCreate("List", opList, dtAll,
				       2, 3, dtAll, dtAll, dtAll);

When Geppetto creates a new program, it will randomly choose a number of
arguments between the minimum and maximum (2 and 3 in this case).


		Using Multiple Data Types in a Program

Geppetto isn't restricted to a single type of data in a program.  It's
possible to evolve a program which provides both an integer and a floating
point input to a black box function and have that function return a boolean
value back to the case fitness function.

The list of terminals could be just a source of random integers and a
source of random floating-point numbers and would be pretty close to
srTerminals().

For operators, we'll have our black box function and the usual arithmetic
operators, written to handle both integer *and* floating-point math.  Here's
the addition operator (the rest are left as an exercise for the reader):

	result *
	opAdd(argc, argv, envp)
	int argc;
	object *argv;
	void *envp;
	{
		result *r0, *r1;
		result *answer;
	
		/* evaluate args, abort on error */
		r0 = objectEval(argv[0], envp);
		if (resultIsError(r1))
			return(r1);
		r1 = objectEval(argv[1], envp);
		if (resultIsError(r1)) {
			resultFree(r0);
			return(r1);
		}
	
		/* if we're not doing mixed math... */
		if (resultSameType(r0, r1) {

			/* things are pretty easy */
			if (resultIsInteger(r0))
				answer = resultCreate(dtInteger,
						      resultInteger(r0) +
						      resultInteger(r1));
			else
				answer = resultCreate(dtFloat,
						      resultFloat(r0) +
						      resultFloat(r1));
		} else {

			/* make sure the integer is last */
			if (resultIsInteger(r0)) {
				answer = r0;
				r0 = r1;
				r1 = answer;
			}

			/* do addition */
			answer = resultCreate(dtFloat, resultFloat(r0) +
					      (float )resultInteger(r1));
		}

		/* clean up and return */
		resultFree(r0);
		resultFree(r1);
		return(answer);
	}

This version handles integer, floating-point and mixed math.  For
convenience, we'll avoid the mixed math.   We can create the function list
like this:

	objectList *
	bbFunctions()
	{
		objectList *list;
	
		list = objectListCreate(9);
		objectListAdd(list, complexOperatorSrcCreate("+", opAdd,
			      dtInteger, 2, 2, dtInteger, dtInteger));
		objectListAdd(list, complexOperatorSrcCreate("-", opSubtract,
			      dtInteger, 2, 2, dtInteger, dtInteger));
		objectListAdd(list, complexOperatorSrcCreate("*", opMultiply, 
			      dtInteger, 2, 2, dtInteger, dtInteger));
		objectListAdd(list, complexOperatorSrcCreate("/", opDivide,
			      dtInteger, 2, 2, dtInteger, dtInteger));
		objectListAdd(list, complexOperatorSrcCreate("+", opAdd,
			      dtFloat, 2, 2, dtFloat, dtFloat));
		objectListAdd(list, complexOperatorSrcCreate("-", opSubtract,
			      dtFloat, 2, 2, dtFloat, dtFloat));
		objectListAdd(list, complexOperatorSrcCreate("*", opMultiply, 
			      dtFloat, 2, 2, dtFloat, dtFloat));
		objectListAdd(list, complexOperatorSrcCreate("/", opDivide,
			      dtFloat, 2, 2, dtFloat, dtFloat));
		objectListAdd(list, complexOperatorSrcCreate("BB", opBlackBox,
			      dtBoolean, 2, 2, dtInteger, dtFloat));
		return(list);
	}

The final step is to tell Geppetto which of the possible data types
(dtBoolean, dtInteger or dtFloat) should actually be returned to the case
fitness routine.  In the appInitialize() routine, we need to add this line:

	populationSetReturnTypes(pop, dtBoolean);

Now, since the black box function is the only one that returns a boolean
value, it will always be selected as the root of the parse tree.
Only integer constants and operators will be connected to the first argument
and only floating-point constants and operators to the second.

Using this same method we could have Geppetto choose between several black
box functions.  We'd set up opBlackBox0(), opBlackBox1(), etc. and create
sources for "BB0", "BB1", etc. which take an integer and a floating-point
number as arguments and return a boolean value.  When Geppetto builds the
parse trees, it will choose one of the black box functions for the root of
the tree and fill the rest out as usual.  The usual GP methods would then
select the black box function which is most fit for the application.

If we'd wanted to allow either dtBoolean results OR dtInteger results,
we could have given the 'population' object a list of these data types ORed
together:

	populationSetReturnTypes(pop, dtBoolean|dtInteger);

Geppetto would build parse trees using operators returning either type as
the root.


			Aliases for Data Types

Suppose that instead of a boolean value, the black box function returned
an integer.  If we try to set the return type to 'dtInteger', we'll
end up with a few programs which don't include our black box function
at all!

To handle this problem, we need to create a data type alias that Geppetto
can use to keep the value returned by the black box function distinct from
the dtIntegers returned by the arithmetic functions.

To create this datatype alias, we first add a #define somewhere at the top of
our program, redefining one of the user-defined types to a more descriptive
name:

	#define dtBlackBox	dtUserDef0

Then, in our appInitialize() procedure, we tell Geppetto the actual type
that this alias represents:

	datatypeMakeAlias(dtBlackBox, dtInteger);

Finally, our operator source creation will look like this:

	osp = complexOperatorSrcCreate("blackBox", opBlackBox, dtBlackBox,
				       2, 2, dtInteger, dtFloat);

Geppetto thinks that dtBlackBox is different from dtInteger, but all
the 'result' procedures will return the actual type.  resultIsInteger()
would return TRUE on an object with a datatype of dtBlackBox.


			Problems?  Suggestions?

If you've found a bug in Geppetto or there's an application which Geppetto
isn't quite able to handle, send me E-mail at 'dglo@CS.Berkeley.EDU' and
I'll see what I can do.  I'm doing this in my free time, though, so I
might not be able to help you immediately ...

The Geppetto Genetic Programming System is Copyright (C) 1993 by Dave Glowacki

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting
documentation.  This software is provided "as is" without express or
implied warranty.
