		Welcome to the wonderful world of Geppetto!  

Geppetto is an environment for creating and evolving programs
semi-automatically.  Using its own variables (and data types), along with
operators supplied by you, Geppetto uses genetic and Darwinian operations
to evolve randomly created parse trees into programs which solve the task
you've selected.  It comes with a simple driver which takes care of some
of the messy details and allows details of a single run to be customized
from the command line.

This user's guide assumes that you're familiar with Genetic Programming,
either from John R. Koza's "Genetic Programming: On the Programming of
Computers by Means of Natural Selection" (MIT Press, ISBN 0-262-11170-5)
or from the numerous papers available on ftp.cc.utexas.edu in
/pub/genetic-programming/papers.


		A Sample Application: Symbolic Regression

This application, described in Appendix B of Genetic Programming, involves
discovering the polynomial function (X*X)/2.  We'll walk through the process
of building an application in roughly the same order used by Koza.


Step 1: The Set of Terminals

The first major step defined by Koza is to identify the set of 
terminals. For this problem, we obviously need a variable and a source for 
a bunch of floating point numbers.

We start by defining a C variable:

	float x;

A Geppetto variable can then be created from this with the variableCreate() 
procedure. This procedure takes three arguments (a 'datatype', a 'char' 
pointer to the variable's name, and a pointer to the actual C variable) and 
returns a pointer to a 'variable' object. The code to create a Geppetto 
variable is: 

	variableCreate(dtFloat, "X", &x);

For this example, the Geppetto variable name is the same as the C variable
name. It could just as easily have been "foo" or "MyVariable".  If you're
planning on using checkpoints, you should avoid spaces in the variable name.

Note also that the variableCreate() procedure can be used to define a 
variable for any of the data types supported by Geppetto. All the basic C
types are supported, plus a few extra types.  The Geppetto data types
and their C equivalents are:

	dtVoid		void
	dtBoolean	bool		(Provided via a 'typedef')
	dtShort		short
	dtInteger	int
	dtLong		long
	dtFloat		float
	dtDouble	double
	dtError		errorCode	(Provided via a 'typedef')
	dtList		resultList *	(Provided via a Geppetto structure)
	dtBlob		blob *		(Explained below)

To create an integer variable, simply define a C variable (with 
something like 'int i;') and use it to create a Geppetto variable: 

	variableCreate(dtInteger, "i", &i);

Geppetto programs use 'constantSrc' objects to supply random constants. To 
create a source of floating-point constants, we use the floatSrcCreate() 
procedure, which is a friendlier interface to the more versatile 
constantSrcCreate() procedure. floatSrcCreate() takes two arguments, the low 
end of the random number range and the high end. The following code creates 
an object which generates random floating-point numbers between -5.0 and 
5.0: 

	floatSrcCreate(-5.0, 5.0);

Setting both arguments to 0 will produce the full range of random variables
(every positive integer or a floating-point number between 0.0 and 1.0)

Both of these terminals need to be put into a list. Geppetto supplies a set 
of 'objectList' procedures for this purpose. We first create a list which 
will hold two objects: 

	objectList *list;
	
	list = objectListCreate(2);

and then add the two constants to the list with the objectListAdd() 
procedure, which takes two arguments, a pointer to the 'objectList' and a 
pointer to the 'object' to add to the list. Every Geppetto object can be 
cast to and from this 'object' type. 

Since we're using the simple driver provided with Geppetto, we need to 
package this all up in an appTerminals() procedure. The finished procedure 
looks like this: 

	objectList *
	srTerminals()
	{
		objectList *list;
	
		list = objectListCreate(2);
		objectListAdd(list, variableCreate(dtFloat, "X", &x));
		objectListAdd(list, floatSrcCreate(-5.0, 5.0));
		return(list);
	}


Step 2: The Set of Functions

The second major step involves defining the function set for the problem. 
For this problem, the function set consists of the four arithmetic 
operators. 

Since Koza uses Lisp, he can use the actual Lisp '+', '-' and '*' 
operators, though he has to provide a protected division operator. Since 
Geppetto uses a compiled language, we must write code for these operators. 

Every object in a Geppetto program, when evaluated, returns a pointer to a 
'result' object. The simplest form of Geppetto operator takes as arguments 
a list of these pointers and a pointer to an application-specific structure 
(which can be ignored for this problem.) 

Here's how '+' would be written as a Geppetto operator:

	result *
	opAdd(argv, envp)
	const result **argv;
	void *envp;
	{
		float fval;
	
		fval = resultFloat(argv[0]) + resultFloat(argv[1]);
		return(resultCreate(dtFloat, fval));
	}

This procedure converts its two arguments to floating-point values using 
resultInteger(), adds them together and stores this new value in 'fval'. It 
then creates a new floating-point 'result' from 'fval' using 
resultCreate(). The intermediate value could be eliminated to create an 
even more terse version of the operator: 

	result * 
	opAdd(argv, envp)
	const result **argv;
	void *envp;
	{ 
		return(resultCreate(dtFloat, resultFloat(argv[0]) + 
				resultFloat(argv[1])));
	}

The '-' and '*' operators are almost exactly the same as opAdd(), differing 
only in the name and operation performed. Division, however, must be 
protected just as in the Lisp version.  Here's the code to implement 
division:

	result * 
	opDivide(argv, envp)
	const result **argv;
	void *envp;
	{
		float fval;
	
		if (resultFloat(argv[1]) == 0)
			fval = 1;
		else
			fval = resultFloat(argv[0]) / resultFloat(argv[1]);
		return(resultCreate(dtFloat, fval));
	}

The newly created operators now have to be added to the function list.  
As in the terminal list, we'll create an 'objectList' and add four 
'operator' objects to it.  These objects are created with the 
simpleOperatorSrcCreate() procedure, which takes as arguments a pointer to
the name which Geppetto will use to refer to the operator, a pointer to the
operator and the number of arguments to the operator.  Here's the function
list code:

	objectList *
	srFunctions()
	{
		objectList *list;
	
		list = objectListCreate(4);
		objectListAdd(list,
			      simpleOperatorSrcCreate("+", opAdd, 2));
		objectListAdd(list,
			      simpleOperatorSrcCreate("-", opSubtract, 2));
		objectListAdd(list,
			      simpleOperatorSrcCreate("*", opMultiply, 2));
		objectListAdd(list,
			      simpleOperatorSrcCreate("/", opDivide, 2));
		return(list);
	}


Step 3: The Fitness Measure

The symbolic regression problem uses ten points regularly spaced 
between 0.0 and 1.0 as its measure of fitness.

These numbers need to be stored somewhere, so we'll create an array to 
hold both the fitness cases and the correct answers, using a C structure:

	#define NUMBER_OF_FITNESS_CASES	10
	struct {
		float x;
		float answer;
	} fitnessCase[NUMBER_OF_FITNESS_CASES];

The actual array initialization code could be done like this:

	int i;
	
	/* initialize fitness cases */
	for (i = 0; i < NUMBER_OF_FITNESS_CASES; i++) {
		x = i / NUMBER_OF_FITNESS_CASES;
		fitnessCase[i].answer = (x*x)/2;
	}

(The 'x' used in this array initialization code is the floating-point variable
'x' declared back in step 1.)

At a minimum, an application should provide two fitness-related procedures: 
one to set up each fitness case for every program evaluated and a second to 
rate the 'result' obtained by evaluating each program.

The first procedure is fairly trivial to write:

	void *
	srCaseInitialize(fc)
	int fc;
	{
		x = fitnessCase[fc].x;
		return(0);
	}

This procedure is passed the number of the fitness case to set up.  It 
sets out C variable to the appropriate value.  Since this application isn't
complex enough to need an application-specific environment, it passes back
a null pointer.

The second fitness-related procedure is a bit more complicated.  It accepts
six parameters: a pointer to the 'result' returned by the current program,
the number of the fitness case which the program just evaluated, pointers 
to the hits, raw fitness and standardized fitness variables for this 
program and a pointer to the application-specific environment.  The bare
bones of the second procedure look like this:

	void
	srCaseFitness(rp, fc, hitp, rawp, stdp, envp)
	result *rp;
	int fc;
	int *hitp;
	double *rawp;
	double *stdp;
	void *envp;
	{
	}

A program scores a hit in the symbolic regression problem if it's within 
.01 of the correct answer.  For this procedure, we'll use resultFloat() to
get the floating-point value from the 'result', subtract it from the correct
answer in the fitnessCase array and take the absolute value of that using
the fabs() function.  This value can then be used to see if we got a hit:

	float diff;
	
	/* compute difference between program result and actual answer */
	diff = fabs(resultFloat(rp) - fitnessCase[fc].answer);

	/* see if we got a hit */
	if (diff < 0.01)
		*hitp += 1;

The raw fitness for this problem is the sum of all of the differences,
so we'll add the value computed above:

	/* set raw fitness */
	*rawp += diff;

We'll also want to set the standardized fitness after the final fitness
case.  For this problem, standardized fitness is the same as raw fitness:

	/* set standardized fitness after final fitness case */
	if (fc == NUMBER_OF_FITNESS_CASES-1)
		*stdp = *rawp;

The finished case fitness routine looks like this:

	void
	srCaseFitness(rp, fc, hitp, rawp, stdp, envp)
	result *rp;
	int fc;
	int *hitp;
	double *rawp;
	double *stdp;
	void *envp;
	{
		float diff;
	
		/* compute difference between result and answer */
		diff = fabs(resultFloat(rp) - fitnessCase[fc].answer);
	
		/* see if we got a hit */
		if (diff < 0.01)
			*hitp += 1;
	
		/* set raw fitness */
		*rawp += diff;
	
		/* set standardized fitness after final fitness case */
		if (fc == NUMBER_OF_FITNESS_CASES-1)
			*stdp = *rawp;
	}

Everything else regarding fitness is automatically taken care of by Geppetto.


Step 4: The Criterion for Designating a Result and Terminating a Run

(Koza does step 5 before step 4, but it's more natural in Geppetto
to do them in this order.)

Geppetto will automatically terminate a run after the specified number of 
generations.  You might, however, wish to terminate the run when the best 
program for a generation meets certain criteria.

For the symbolic regression problem, we'll terminate the run if a program 
has a hit for every fitness case:

	int
	srTerminateRun(hits, raw, std)
	int hits;
	double raw, std;
	{
		return(hits == NUMBER_OF_FITNESS_CASES);
	}


Step 5: The Parameters and Variables for Controlling the Run

At a minimum, you'll need to tell Geppetto the number of fitness cases for
your application and point it at the application's terminal and function
lists.  You'll also want to let Geppetto know how to evaluate the fitness of
each case.  You'll also probably want to tell it how to set up each fitness
case and whether or not you want to prematurely terminate the run.

All of these things are done using the 'interface' object, so named because
it's the interface between Geppetto and your application.

The simple driver provided with Geppetto passes an 'interface' object
pointer to a procedure in your application named appInitialize(), from
which your application-specific initialization is done.  Your application
MUST have a procedure named appInitialize(), but that's the only hard-coded
name required by either Geppetto or the simple driver.

Here's what the initialization function for this application would look like:

	void
	appInitialize(ip)
	interface *ip;
	{
		int i;
		objectList *tList, *fList;
	
		/* initialize fitness cases */
		for (i = 0; i < NUMBER_OF_FITNESS_CASES; i++) {
			x = (float )i / NUMBER_OF_FITNESS_CASES;
			fitnessCase[i].x = x;
			fitnessCase[i].answer = (x*x)/2;
		}
	
		/* set global variables */
		generations = 30;
		populationSize = 200;
	
		/* build terminal and function lists */
		tList = srTerminals();
		fList = srFunctions();
	
		/* set app-specific variables */
		interfaceSetFitnessCases(ip, NUMBER_OF_FITNESS_CASES);
		interfaceSetTerminalList(ip, tList);
		interfaceSetFunctionList(ip, fList);

		/* set app-specific functions */
		interfaceCaseInitializeFunc(ip, srCaseInitialize);
		interfaceCaseFitnessFunc(ip, srCaseFitness);
		interfaceTerminateRunFunc(ip, srTerminateRun);
	}

The first block of code is the fitness case initialization described above.

The second block tells Geppetto to only create a population of 200 programs
and to only run for a maximum of 30 generations.  To do this it uses some
global variables defined in the simple driver.

The third block creates the terminal and function lists using the
procedures written in steps 1 and 2.

The fourth block tells the 'interface' object how many fitness cases this
application uses and sends it pointers to the terminal and function lists.

The final block points the 'interface' object at the procedures we created
in step 3.

If you're using the simple driver and a sensible operating system, most of
the other parameters and variables can be set at run-time using the
command-line arguments documented below.

The full source code listing for this symbolic regression problem can be
found in sr.c.  This directory also contains a Makefile, which will build
both 'sr' and a debugging version 'debugsr' (explained below).

There are a couple of differences between the code described above and the
code in sr.c.

First, there are a couple of #include lines at the top of the file.
Just as any program which uses any floating-point function must include the
<math.h> file, Geppetto applications using the simple driver should include
"geppetto.h".  This file then #includes everything needed by most Geppetto
applications.

There's also some code between an '#ifdef DEBUG' and an '#endif' statement
at the bottom of the file.  This is an even simpler driver routine that, when
compiled with -DDEBUG and run, will evaluate the "(/ (* X X) 2)" program.
You can also enter programs at the command-line (make sure to enclose each
separate program in quotes).  To see how "(* X X)" would do, run:

	./debugsr "(* X X)"

You can enter as many of these as you'd like (or as many as your shell will
allow).  This is handy for making sure your operators are really doing what
you think they're doing.


			More on the 'interface' object

As was mentioned before, the 'interface' object is the interface between
Geppetto and your code.  Here are the procedures provided to set up this
object:

	void interfaceSetFitnessCases(interface *ip, int cases)
		sets the number of times the program is run and your case
		fitness routine is called

	void interfaceSetTerminalList(interface *ip, objectList *tl)
		sets the terminal list for your application

	void interfaceSetFunctionList(interface *ip, objectList *tl)
		sets the function list for your application

	void interfaceSetReturnTypes(interface *ip, datatype mask)
		sets the list of data types which a valid program can
		return.  This is explained more fully below.

	void interfaceCaseInitializeFunc(interface *ip, funcptr)
		points the 'interface' object at the procedure called to
		set up each fitness case.  The bare bones of this procedure
		look like this:

			void *
			myCaseInit(caseNum)
			int caseNum;	/* index for this fitness case */
			{
			}

		The returned value is an application-defined structure
		which passed to each operator and to the next two procedures
		(if they're defined).

		If you don't set this pointer, it will be ignored and
		a NULL pointer will be passed to the operators and the
		next two procedures.

	void interfaceCaseTerminateFunc(interface *ip, funcptr)
		points the 'interface' object at the procedure called each
		time the program is executed.  The bare bones of this
		procedure look like this:

			int
			myCaseTerminate(rp, envp, caseNum)
			result *rp;	/* result returned during this loop */
			void *envp;	/* pointer returned by 'myCaseInit' */
			int caseNum;
			{
			}

		If this procedure returns 0, execution is terminated and
		the case fitness function is called.  Otherwise, the program
		is repeatedly evaluated until the maximum number of loops
		has been reached or an error is returned as a 'result'.

		If you don't set this pointer, each program will only be
		evaluated once a generation.

	void interfaceCaseFitnessFunc(interface *ip, funcptr)
		points the 'interface' object at the procedure called to
		analyze the 'result' returned by the program for each
		fitness case.  The bare bones of this procedure look like
		this:

			void
			myCaseFitness(rp, caseNum, hitp, rawp, stdp, envp)
			result *rp;	/* result returned by the program */
			int caseNum;	/* index for this fitness case */
			int *hitp;	/* pointer to program's hit counter */
			int *rawp;	/* pointer to program's raw fitness */
			int *stdp;	/* pointer to program's std fitness */
			void *envp;	/* pointer returned by 'myCaseInit' */
			{
			}

		If you don't set this pointer, no fitness values will be set
		and your application becomes essentially a random search.

	void interfaceEvalCleanup(interface *ip, funcptr)
		points the 'interface' object at the procedure called to
		do any clean-up required after all the programs in this
		generation have been evaluated.  The bare bones of this
		procedure look like this:

			void
			myCleanup(envp)
			void *envp;
			{
			}

		If you don't set this pointer, it will be ignored.

	void interfaceTerminateRun(interface *ip, funcptr)
		points the 'interface' object at the procedure called to
		see if this run may be terminated early.  The bare bones of
		this procedure look like this:

			int
			myTerminateRun(hits, rawFitness, stdFitness)
			int hits;
			double rawFitness;
			double stdFitness;
			{
			}

		The values passed are from the program to which your case
		fitness function awarded the best standardized fitness for
		this generation.

		If you don't set this pointer, it will be ignored and the
		process will continue to the maximum number of generations.

	void interfaceDestructor(interface *ip, funcptr)
		points the 'interface' object at the procedure called to
		clean up before the application is terminated and execution
		ceases.

		If you don't set this pointer, it will be ignored.


			The Simple Geppetto Driver

The 'main.o' simple driver contains a bewildering number of command-line
arguments which allow you to customize each run.  Here's a brief description
of each:

	-a	Geppetto tries to eliminate whatever it can.  If a program
		is an exact copy of a program from a previous generation,
		it will produce the same results (given the same inputs) and
		thus finish with the same number of hits and standardized
		fitness.  This argument tells Geppetto not to evaluate exact
		copies (and it's actually the default, so you don't really
		need to specify this.)

	-A	This does the opposite of the '-a' flag, forcing programs
		to always be evaluated.  It's useful when the fitness cases
		change from generation to generation.

	-c	This argument sets the creation method.  Specifying
		'-c full' ('-c f' will also work) creates fully filled
		parse trees, '-c grow' creates trees of random shapes, and
		'-c ramped' creates trees using Koza's "ramped
		half-and-half" method.  The "ramped" method is the default.

	-d	This argument sets the maximum depth any evolved program may
		reach.  The default is 17 nodes.

	-f	This argument sets the name of the file to which checkpoint
		information will be written.  See '-x' for information
		about the frequency with which this is done.

	-F	This is the name of an existing checkpoint file, from
		which a population is read.  Any arguments set prior to this
		will be replaced by the values in this file and any
		arguments set after this will override values set in the
		checkpoint file.

	-g	This sets the maximum number of generations.  Since Geppetto
		is written in C and naturally counts from 0, this is one
		less than Koza's G and by default is set to 50.

	-i	This is the maximum depth for the initial population.  It
		defaults to 6, but is increased as necessary in order to
		ensure that every member of the initial population is unique.

	-m	This is the maximum size of a subtree created during the
		mutation operation.  It's set to 4 initially.

	-o	This value is only used if you're using Greedy Overselection
		(see '-s'.)  It's normally computed from the population size
		before any breeding takes place, but you can set it to a
		higher or lower value using this flag.

	-p	This sets the population size.  It defaults to 500
		individuals.

	-r	This sets the random number seed. By default, it's set to
		the value returned by the time() function 'exclusive-or'ed
		with the value returned by getpid(), and thus is almost
		certainly unique for consecutive runs.

	-s	This sets the selection method.  Specifying '-s f'
		tells Geppetto to use the Fitness-Proportionate method
		and is the default.  You can also use '-c g' for Greedy
		Overselection or '-c t' for Tournament selection.

	-t	This value is only used if you're using the Tournament
		method of selection (see '-s' above.)  It defaults to 3
		rounds.

	-u	This flag sets the "uniqueStats" flag, which tells Geppetto
		to gather and print statistics about the number of unique
		programs in each generation.  It's a useful metric but it's
		time-consuming.

	-U	This flag turns off the "uniqueStats" mode and is the
		default.

	-v	This sets the "verbose" flag, which causes Geppetto to print
		out statistics for EVERY program before the normal
		generation statistics.

	-V	This turns off the "verbose" flag and is the default.

	-x	This is the number of generations which will pass before
		a new checkpoint file is created.  By default, this is set
		to 0, meaning no checkpoint files will be written.

	-z	This is a floating-point value, and sets the parsimony value.
		This value is multiplied by the total number of nodes in
		a program and added to its final standardized fitness.
		Large numbers encourage smaller programs, negative numbers
		encourage HUGE programs.  It's set to 0.0 by default.

There are also a number of global variables which may be set from within
an application's appInitialize() function (as is done in sr.c).  If you
set one of these values inside your appInitialize() procedure, it will
override the default value.  Arguments specified from the command-line
will still override these values, however.

	int alwaysEval
		is used by the '-a' and '-A' arguments and may be set to 0
		to turn off "alwaysEval" mode.

	programCreationMethod creator
		is used by the '-c' argument and may be set to pcmFull,
		pcmGrow or pcmRamped

	int maxTotalDepth
		is used by the '-d' argument.

	char *checkptName
		is used by the '-f' and '-F' arguments.

	int generations
		is used by the '-g' argument.

	int initialDepth
		is used by the '-i' argument.

	int maxMutateDepth
		is used by the '-m' argument.

	int populationSize
		is used by the '-p' argument.

	int randomSeed
		is used by the '-r' argument.

	parentSelectionMethod selector
		is used by the '-s' argument and may be set to
		psmFitnessProportionate, psmGreedyOverselection or
		psmTournament.

	int countUnique
		is used by the '-u' and '-U' arguments and may be set to 0
		to turn off "countUnique" mode.

	int verbose
		is used by the '-v' and '-V' arguments and may be set to 0
		to turn off "verbose" mode, or any other value to turn it on.

	int checkptFreq
		is used by the '-x' argument.

	double parsimony
		is used by the '-z' argument.

The following values can't be set from the command-line, but can be altered
in appInitialize():

	double pctReproduce
		This is the probability that a program will be copied
		directly to the next generation and is the same as Koza's
		p-sub-r variable.  This should be a value between 0.0
		and 1.0 and is set by default to 0.195.

	double pctXoverInternal
		This is the probability that crossover is performed at
		the internal nodes of two parents.  It defaults to 0.695.

	double pctXoverAny
		This is the probability that crossover is performed at
		any node in the parents and defaults to 0.195

	int maxLoops
		This is the maximum number of times a single program will
		be evaluated in a single generation and defaults to 1000.

Note that the probability of mutation is set to the remainder of 1.0 -
(pctReproduce + pctXoverInternal + pctXoverAny) and thus is 0.015 if none
of the dominant three probabilities are changed.


			Data Types in Geppetto

As was mentioned above, Geppetto supports typed constants and variables.
The basic ones are implemented using the standard C types and thus the
usual rules hold:
  * dtShort is smaller than (or possibly the same size as) dtInteger,
    which is shorter than (or possibly the same size as) dtLong
  * dtFloat is single-precision floating-point while dtDouble is
    double-precision floating-point.

Geppetto supports boolean values, which are either TRUE (0) or FALSE
(non-zero), using 'dtBoolean' inside Geppetto objects and 'bool' as the
C types (just as 'int' is the C type for 'dtInteger'.)

There is also the 'dtList' type, which holds a list of 'result's.  Its
C type is 'resultList *'.  There are a few procedures to manipulate
objects of this type:

	resultList *resultListCreate(int len)
		creates a list and preallocates 'len' elements

	int resultListAdd(resultList *list, result *elem)
		adds 'elem' to 'list', returns 0 if 'elem' was added

	int resultListLength(resultList *list)
		returns the number of elements in 'list'

	result *resultListEntry(resultList *list, int i)
		returns element 'i' from the list

	void resultListFree(resultList *list)
		frees memory used by 'list'

If you need a different datatype, there's the Blob type.  There are
placeholders in the Geppetto library which can be replaced by your own
routines if you need some specialized data type.  Routines which need to be
replaced:

	blob *blobCreate(datatype dtype)
		creates a new blob of memory, of type 'dtype'

	blob *blobCopy(blob *bbp, datatype dtype)
		creates an exact copy of the 'dtype' blob pointed to by 'bbp'

	int blobCompare(blob *bbp1, blob *bbp2, datatype dtype)
		returns 1 if 'bbp1' and 'bbp2' are identical, 0 otherwise

	int blobToString(blob *bbp, datatype dtype, charString *cstr)
		converts a blob of 'dtype' memory to a text representation
		which can be written to a checkpoint file

	blob *blobParse(const char **cpp, datatype *dtp)
		converts a pointer to a character string (*cpp) to a blob
		of memory, updating *cpp to point to the character *after*
		the string representing the blob and updating dtp to the
		correct datatype.

	void blobFree(blob *bbp, datatype dtype)
		frees blob of 'dtype' memory pointed to by 'bbp'


		Manipulating 'result' objects in Geppetto

'result' structures can contain any of the above data types.  Here are
the procedures provided by Geppetto to manipulate them:

	datatype resultDataType(result *rp)
		returns the data type of this 'result'

	bool resultIsBoolean(result *rp)
	bool resultIsInteger(result *rp)
	bool resultIsShort(result *rp)
	bool resultIsInteger(result *rp)
	bool resultIsLong(result *rp)
	bool resultIsFloat(result *rp)
	bool resultIsDouble(result *rp)
	bool resultIsList(result *rp)
	bool resultIsBlobPtr(result *rp)
	bool resultIsError(result *rp)
		All these procedures return TRUE if the result is of the
		appropriate type, FALSE otherwise.

	bool resultSameType(result *r1, result *r2)
		returns TRUE if both results have the same type

	bool resultBoolean(result *rp)
	short resultShort(result *rp)
	integer resultInteger(result *rp)
	long resultLong(result *rp)
	float resultFloat(result *rp)
	double resultDouble(result *rp)
	resultList *resultListPtr(result *rp)
	blob *resultBlobPtr(result *rp)
		All these procedures return the requested value from the
		'result'.  Be careful when using them, because they don't
		do any type checking and will happily return the first
		two or four bytes of a 'double' as a 'short' if that's what
		you ask for.

If you're doing GP the way it's done in Koza's book, you'll only ever use a
single type and you won't need to worry about how you use that last group
of procedures.  If you mix even two different types, you need to carefully
check the data type of each 'result' before trying to grab its value.

	void resultSetVoid(result *rp)
		Change this 'result' to a Void result

	void resultSetBoolean(result *rp, bool val)
	void resultSetShort(result *rp, short val)
	void resultSetInteger(result *rp, int val)
	void resultSetLong(result *rp, long val)
	void resultSetFloat(result *rp, float val)
	void resultSetDouble(result *rp, double val)
	void resultSetListPtr(result *rp, resultList *valp)
	void resultSetBlobPtr(result *rp, blob *valp, datatype dtype)
		These procedures all change the 'result' pointed to by
		"rp" to the appropriate	'datatype' and assign the
		value to it.


			The 'Error' data type

Koza's protected division operator returns 1 if a divide-by-zero error
occurs.  This works passably for research, but if you'd like to evolve
working programs, it'd be nice to eliminate these incorrect programs
when possible.  To do this, we can return an error:

	result * 
	opDivide(argv, envp)
	const result **argv;
	void *envp;
	{
		float fval;
	
		/* bomb on divide-by-zero errors */
		if (resultFloat(argv[1]) == 0)
			return(resultCreate(dtError, ErrorDivideByZero));

		return(resultCreate(dtFloat, resultFloat(argv[0]) /
							resultFloat(argv[1])));
	}

If a simple operator like opDivide() returns an error, Geppetto stops
evaluating the program and returns the error as the program's 'result'.
This means that you don't need to worry about receiving an Error in your
simple operators.

We will need to check for errors in our CaseFitness() routine.  I usually
give erroneous programs a high standardized fitness to emphasize their
unfitness.  Here's a new version of the symbolic regression case fitness
routine which penalizes erroneous programs:

	void
	srCaseFitness(rp, fc, hitp, rawp, stdp, envp)
	result *rp;
	int fc;
	int *hitp;
	double *rawp;
	double *stdp;
	void *envp;
	{
		float diff;
	
		if (resultIsError(rp) || *stdp == HUGE_VAL) {
	
			/* set standardized fitness to a big number */
			*stdp = HUGE_VAL;
		} else {

			/* compute difference between result and answer */
			diff = fabs(resultFloat(rp) - fitnessCase[fc].answer);
	
			/* see if we got a hit */
			if (diff < 0.01)
				*hitp += 1;
	
			/* set raw fitness */
			*rawp += diff;
		}
		
		/* set standardized fitness after final fitness case */
		if (fc == NUMBER_OF_FITNESS_CASES-1)
			*stdp = *rawp;
	}

This sets the standardized fitness to the largest floating-point number
possible if there is ever an error.  (Since standardized fitness is as
big as it can ever be, it also needs to make sure that no more values
get added to the standardized fitness.)

The code in opDivide() won't compile as written, because 'ErrorDivideByZero'
isn't defined.  Currently, Geppetto only knows about a single errorCode,
ErrorBadDataType, which is used internally.  You can use this everywhere
if you don't care about anything but the fact that there was an error.

If you want to be able to distinguish the different errors, you can
define your own errors.

To create a new error, add lines like these somewhere at the top of your
program:

	#define ErrorDivideByZero	ErrorUserDefined+0
	const char *MsgDivideByZero =	"Divide by Zero;

The second error message would be 'ErrorUserDefined+1' and so on.

In your appInitialize() function, you then add this line:

	errorCodeSetMessage(ErrorDivideByZero, MsgDivideByZero);

This allows Geppetto to use the string when describing the error.


			Complex Operators

You may have wondered why I keep saying "simple operator".  That's because
there are two different types of operators, simple and complex.

A simple operator, as we've already seen, is given a list of pointers to
'result's and a pointer to the application-specific data, does something
with them, and returns a new 'result'.

A complex operator looks a great deal like main() in a Unix C program.
There are three arguments, an argument count, a list of pointers and a
pointer to the application-specific data.  The complex operator's list
of pointers is a list of 'object's rather than a list of 'result's.

'object's are Geppetto's internal representation of a program and need
to be evaluated using objectEval() in order to get a pointer to a 'result'.
Thus a complex opAdd() might look like this:

	result *
	opAdd(argc, argv, envp)
	int argc;
	object *argv;
	void *envp;
	{
		result *r0, *r1;
		result *answer;
	
		/* evaluate args, abort on error */
		r0 = objectEval(argv[0], envp);
		if (resultIsError(r1))
			return(r1);
		r1 = objectEval(argv[1], envp);
		if (resultIsError(r1)) {
			resultFree(r0);
			return(r1);
		}
	
		/* get the answer */
		answer = resultCreate(dtFloat, resultFloat(r0) + 
				      resultFloat(r1));

		/* clean up and return */
		resultFree(r0);
		resultFree(r1);
		return(answer);
	}

The first thing you probably noticed about this complex operator is that
we've got to do all our own memory deallocation using the resultFree()
procedure.  Since operators are evaluated hundreds or thousands of times
in a single generation, a memory leak here can be disastrous.

We also need to check for errors if we're using them in other operators.
It's obviously MUCH more convenient (and probably safer) to implement
opAdd() as a simple operator.

A better example of a complex operator is the IfGreaterThan operator:

	result *
	opIfGreaterThan(argc, argc, envp)
	int argc;
	object *argv;
	void *envp;
	{
		result *r0, *r1, *answer;
		bool cond;
	
		/* evaluate the first two arguments */
		r0 = objectEval(argv[0], envp);
		if (resultIsError(r0))
			return(r0);
		r1 = objectEval(argv[1], envp);
		if (resultIsError(r1)) {
			resultFree(r0);
			return(r1);
		}
	
		/* compare the first two results */
		cond = resultFloat(r0) > resultFloat(r1);
		resultFree(r0);
		resultFree(r1);

		/* evaluate the appropriate branch */
		if (cond)
			return(objectEval(argv[2], envp));
		else
			return(objectEval(argv[3], envp));
	}

This still has all the memory management problems but the last four lines
are the key to why you'd want to use complex operators.  Complex operators
don't get their arguments handed to them already evaluated, so they can
evaluate only when absolutely necessary.  The IfGreaterThan operator only
evaluates one of the last two subtrees.

Just as we created a source for simple operators with the
simpleOperatorSrcCreate() procedure, we create a source for complex
operators with complexOperatorSrcCreate().  Here's what the opIfGreaterThan
source looks like:

	operatorSrc *osp;

	osp = complexOperatorSrcCreate("ifGreaterThan", opIfGreaterThan,
				       dtAll, 4, 4,
				       dtAll, dtAll, dtAll, dtAll);

Just as before, we need to give Geppetto a char string to refer to the
operator and a pointer to the operator.  For complex operators, we
also need to tell Geppetto all the possible types that can be returned,
the minimum and maximum number of arguments to this operator, and the
data types of each argument.

(The 'dtAll' datatype is a wildcard matching every datatype.)


		Variable Number of Arguments

Specifying the minimum and maximum number of arguments means that a single
operator can support multiple argument lengths.  For instance, rather than
having a List2 and List3 type as Koza does in one of his examples, we
could write a generic opList() operator:

	result *
	opList(argc, argv, envp)
	int argc;
	object *argv;
	void *envp;
	{
		int i;
		result *rp = 0;
	
		for (i = 0; i < argc; i++) {

			/* free previous result */
			if (rp)
				resultFree(rp);

			/* evaluate this argument */
			rp = objectEval(argv[i], envp);
		}

		/* return final result */
		return(rp);
	}

We then create a source with a variable number of arguments:

	osp = complexOperatorSrcCreate("List", opList, dtAll,
				       2, 3, dtAll, dtAll, dtAll);

When Geppetto creates a new program, it will randomly choose a number of
arguments between the minimum and maximum (2 and 3 in this case).


		Using Multiple Data Types in a Program

Geppetto isn't restricted to a single type of data in a program.  It's
possible to evolve a program which provides both an integer and a floating
point input to a black box function and have that function return a boolean
value back to the case fitness function.

The list of terminals could be just a source of random integers and a
source of random floating-point numbers and would be pretty close to
srTerminals().

For operators, we'll have our black box function and the usual arithmetic
operators, written to handle both integer *and* floating-point math.  Here's
the addition operator (the rest are left as an exercise for the reader):

	result *
	opAdd(argc, argv, envp)
	int argc;
	object *argv;
	void *envp;
	{
		result *r0, *r1;
		result *answer;
	
		/* evaluate args, abort on error */
		r0 = objectEval(argv[0], envp);
		if (resultIsError(r1))
			return(r1);
		r1 = objectEval(argv[1], envp);
		if (resultIsError(r1)) {
			resultFree(r0);
			return(r1);
		}
	
		/* if we're not doing mixed math... */
		if (resultSameType(r0, r1) {

			/* things are pretty easy */
			if (resultIsInteger(r0))
				answer = resultCreate(dtInteger,
						      resultInteger(r0) +
						      resultInteger(r1));
			else
				answer = resultCreate(dtFloat,
						      resultFloat(r0) +
						      resultFloat(r1));
		} else {

			/* make sure the integer is last */
			if (resultIsInteger(r0)) {
				answer = r0;
				r0 = r1;
				r1 = answer;
			}

			/* do addition */
			answer = resultCreate(dtFloat, resultFloat(r0) +
					      (float )resultInteger(r1));
		}

		/* clean up and return */
		resultFree(r0);
		resultFree(r1);
		return(answer);
	}

This version handles integer, floating-point and mixed math.  For
convenience, we'll avoid the mixed math.   We can create the function list
like this:

	objectList *
	bbFunctions()
	{
		objectList *list;
	
		list = objectListCreate(9);
		objectListAdd(list, complexOperatorSrcCreate("+", opAdd,
			      dtInteger, 2, 2, dtInteger, dtInteger));
		objectListAdd(list, complexOperatorSrcCreate("-", opSubtract,
			      dtInteger, 2, 2, dtInteger, dtInteger));
		objectListAdd(list, complexOperatorSrcCreate("*", opMultiply, 
			      dtInteger, 2, 2, dtInteger, dtInteger));
		objectListAdd(list, complexOperatorSrcCreate("/", opDivide,
			      dtInteger, 2, 2, dtInteger, dtInteger));
		objectListAdd(list, complexOperatorSrcCreate("+", opAdd,
			      dtFloat, 2, 2, dtFloat, dtFloat));
		objectListAdd(list, complexOperatorSrcCreate("-", opSubtract,
			      dtFloat, 2, 2, dtFloat, dtFloat));
		objectListAdd(list, complexOperatorSrcCreate("*", opMultiply, 
			      dtFloat, 2, 2, dtFloat, dtFloat));
		objectListAdd(list, complexOperatorSrcCreate("/", opDivide,
			      dtFloat, 2, 2, dtFloat, dtFloat));
		objectListAdd(list, complexOperatorSrcCreate("BB", opBlackBox,
			      dtBoolean, 2, 2, dtInteger, dtFloat));
		return(list);
	}

The final step is to tell Geppetto which of the possible data types
(dtBoolean, dtInteger or dtFloat) should actually be returned to the case
fitness routine.  In the appInitialize() routine, we need to add this line:

	interfaceSetReturnTypes(ip, dtBoolean);

Now, since the black box function is the only one that returns a boolean
value, it will always be selected as the root of the parse tree.
Only integer constants and operators will be connected to the first argument
and only floating-point constants and operators to the second.

Using this same method we could have Geppetto choose between several black
box functions.  We'd set up opBlackBox0(), opBlackBox1(), etc. and create
sources for "BB0", "BB1", etc. which take an integer and a floating-point
number as arguments and return a boolean value.  When Geppetto builds the
parse trees, it will choose one of the black box functions for the root of
the tree and fill the rest out as usual.  The usual GP methods would then
select the black box function which is most fit for the application.

If we'd wanted to allow either dtBoolean results OR dtInteger results,
we could have given the 'interface' object a list of these data types ORed
together:

	interfaceSetReturnTypes(ip, dtBoolean|dtInteger);

Geppetto would build parse trees using operators returning either type as
the root.


			Aliases for Data Types

Suppose that instead of a boolean value, the black box function returned
an integer.  If we try to set the return type to 'dtInteger', we'll
end up with a few programs which don't include our black box function
at all!

To handle this problem, we need to create a data type alias that Geppetto
can use to keep the value returned by the black box function distinct from
the dtIntegers returned by the arithmetic functions.

To create this datatype alias, we first add a #define somewhere at the top of
our program, redefining one of the user-defined types to a more descriptive
name:

	#define dtBlackBox	dtUserDef0

Then, in our appInitialize() procedure, we tell Geppetto the actual type
that this alias represents:

	datatypeMakeAlias(dtBlackBox, dtInteger);

Finally, our operator source creation will look like this:

	osp = complexOperatorSrcCreate("blackBox", opBlackBox, dtBlackBox,
				       2, 2, dtInteger, dtFloat);

Geppetto thinks that dtBlackBox is different from dtInteger, but all
the 'result' procedures will return the actual type.  resultIsInteger()
would return TRUE on an object with a datatype of dtBlackBox.


			Problems?  Suggestions?

If you've found a bug in Geppetto or there's an application which Geppetto
isn't quite able to handle, send me E-mail at 'dglo@cs.Berkeley.EDU' and
I'll see what I can do.  I'm doing this in my free time, though, so I
might not be able to help you immediately ...

The Geppetto Genetic Programming System is Copyright (C) 1993 by Dave Glowacki

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted, provided
that the above copyright notice appear in all copies and that both that
copyright notice and this permission notice appear in supporting
documentation.  This software is provided "as is" without express or
implied warranty.
