It should be clear from the previous section that the MBPR results were being confused by (arguably even obscured by) another effect: introns. This paper has suggested that introns can affect performance in a very negative way. The purpose of this section is to summarize the experimental evidence for introns and to present a partial explanation for their effects. We should mention here that our definition of intron is a behavioral definition. That is, we do not care where the genotype/phenotype line is drawn; an uncalled subtree is an intron, as is a subtree that is called and whose result is ignored.
Before we explain our findings, however, we will summarize what we believe to be the contemporary view of introns in the GP field. Introns have recently been often described as a beneficial aspect of the genetic search that serves at least two important functions. The first ``benefit'' of introns is often stated to be their ability to protect the program of which they are a part from ``destructive'' crossover. The second ``benefit'' of introns is to allow the population as a whole to protect and thereby preserve the best ``building blocks'' in the population. Efforts to insert extra introns into the population as a search aid [Nordin and Banzhaf1995] indicate the current favorable standing of introns in GP.
Many researchers (e.g., [Nordin and Banzhaf1995, McPhee and Miller1995, Rosca and Ballard1995]) have argued that introns are helpful, in that they prevent destructive crossover. However, their argument assumes that the best of the current generation are in the right part of the search space (i.e, that destructive crossover is necessarily bad). While destructive crossover may be bad for natural evolution, the goal of GP is often optimization of a single program, not the survival of any particular individual of individuals.
There are at least three types of introns associated with functions. There are also introns associated with useless terminals, but since this paper had only one experiment with a change in the terminal set ((READ) returning the value of memory for N=1), we will concentrate on the following three function intron types:
Table 5 summarizes the effect that adding inert versions of READ and WRITE has on the computational effort required to solve various problems. In all of the runs in table 5, READ is useless (because it simply passes its one argument up the tree) (causing Local introns). WRITE actually creates an intron by ignoring its first argument and passing its second argument along (causing Hierarchical introns). Measured by either computational effort or by the likelihood that a perfect solution will be found in the required number of generations, the addition of this useless function and this intron causing function have a very strong negative impact on the search process. Since these two functions are not actually acting as memory conduits, the extra computational expense must be caused by the intron effects.
Table 5: Result Comparisons
In table 5:2 we see that when READ becomes a 0 argument function (i.e. a terminal) and WRITE becomes a single argument function (in the N = 1 case), and so both cause only local introns, the performance degradation remains. However, when we remove READ entirely and are left with only WRITE-ANSWER (causing only local introns) (N=0) the performance degradation is lessened, although still negative. Further, with only WRITE-ANSWER we no longer have a function that explicitly causes whole subtrees to become introns; WRITE-ANSWER is itself simply an intron since it does nothing useful. This suggests that the addition of one useless function can have a detrimental effect on the GP search process.
The zero arity READ (table 5:2) returns a zero since it has no value to pass along. Since the presence of (READ) in Reg (1) is the only thing that distinguished it from the Reg (0) runs, the presence of zero as a terminal obviously has a large negative effect. The reason for this can again be explained by intron effects. With random ephemeral constants and X, making an intron is possible using the algebraic primitives, but not as easy as when 0 is also readily available. It is the difference between the likelihood of (* Y 0) and (* Y (- Z Z)). This may not seem like a large difference, but it has a large effect in this case, as shown in the difference between table 5:2 and the first line of table 5:3.
Table 5:3 shows the effect of the useless function (WRITE-ANSWER) as the maximum size allowed for a tree is raised from 1000 to 2000, 5000, and then 10000. In all three cases, the expense of local introns has been reduced to a modest cost. The conclusion is that for populations with sufficiently large members, this particular intron effect (addition of inert functions) ceases to be expensive. This says something important about tree size. The prevailing wisdom seems to be that as long as the tree size maximum is large enough for a non-compacted solution to fit, the search will go fine. Table 5:3 implies that the necessary tree size ceiling is also a function of the ease with which the functions can form introns.
It is important to notice that the conclusion of the previous paragraph is made in the context of N=0. By increasing the maximum tree size, we alleviate this intron ``cramping effect''. In the case where N>0 and/or MBPR is used, the intron expense incurred from the search degradation effect dominates (as entire subtrees can be intron-ized). In this, the N>0, case, increasing the maximum tree size limit has little effect, as is shown in table 4. Now we have an answer to the question ``Why won't the `capping procedure' (proposed in section 3) guarantee a minimal cost increase for MBPR?'' The answer is, the 'capping procedure' presupposes a correct solution to cap. The search degradation caused by introns appears to seriously impede the evolution of such solutions.