Generally, function composition outperforms the baseline learning algorithm by an amount dependent on the complexity of the learning problem. In the robot navigation domain when the goal was moved, the amount of speed up increased with more rooms and fewer paths to goal. A speed up of 60, against an average speed up of 40, was obtained on the configurations with five rooms and a single path to goal. Configurations with only three rooms had the least speed up, but this was not only due to the relative simplicity of the problem.
The top of Figure 35 shows the average of four learning curves for the three room configurations. The bottom of Figure 35 shows one of the configurations that produced these curves. Not only is it one of the easiest tasks (from the experimental set) for the baseline algorithm, but also there are no solutions in the case base for the lowest room. There are no isomorphic subgraphs of this form. Rather than not composing a solution, the system introduces a constant value function for this room. This room represents almost half the state space, so much additional learning is required. As the top of Figure 35 shows, initially there is significant speed up. Further refinement reduces the advantage and for a short while the baseline algorithm is better. But later, function composition gains the upper hand and converges more quickly than the baseline algorithm towards the asymptotic value.
In the robot navigation domain when learning a new task, the amount of speed up varied with the size of the inner room. This was primarily due to the number of actions needed before the features emerged with sufficient clarity for the snake to locate them. Function composition is most successful when the inner room is small. If a wall is long, the feature takes more time to develop, more refinement by Q-learning is needed to make it apparent. Very short walls are also hard to identify. The likelihood of the robot colliding with them is small and it takes many exploratory actions for the features to emerge clearly.
The features may be sufficiently clear for the snake to form a partition, yet not be well enough defined to precisely locate the doorways. A doorway may appear to be a bit wider than it actually is. More importantly, it may appear to be displaced from its true position. Typically, the error in the composed function is small and normal reinforcement learning quickly eliminates it. In one of the experimental runs, configuration 2 in Figure 30, the speed up was reduced by a factor of 2 due to the doorway being incorrectly positioned. The feature representing the lower wall had not completely emerged when the partition was generated. This made the doorway appear to be almost exactly at the corner. The algorithm, in fact, positioned the doorway just on the wrong side of the corner. This resulted in the significantly reduced speed up. But it is unclear why reinforcement learning took so long to correct what seems, on the surface at least, to be a local error. This will be investigated in future work.