If the assumptions of the previous section are met, it is expected that the remaining limitations are due to the present implementation. These limitations are likely to become apparent when the system is applied to other domains. Certainly other domains may differ from those presented in this paper in a number of ways.
A domain may differ in that the dimensionality of the space is higher than the two dimensions of the tasks investigated in this paper. The implementation of the snake has been updated to work in higher dimensions. The bold lines at the top of Figure 36 are one of the simpler tasks from the robot navigation domain. The task has been extended in the Z-dimension. The snake starts out as a sphere and then expands outwards until it fills the room. In this example, the polygonal constraint has not been used, but everything else remains the same. Figure 37 shows the complete partition of the task.
The mathematics behind the snake is not limited to three dimensions. There also seems to be nothing in principle that would prevent other processes such as graph matching, planning or transformation from working in higher dimensions. Speed is the main problem. This is not a problem unique to this approach and there is a large body of research addressing this issue. For instance, although graph matching is in general NP-complete, there is much active research in speeding up matching on the average or in special cases (Galil, 1986; Gold & Rangarajan, 1996). At present, the snake represents the principal restriction on speed. This is an issue of great importance to the vision processing community. Current research is investigating this problem, at least in two or three dimensions. One example is hierarchical methods (Leroy, Herlin, & Cohen, 1996; Schnabel, 1997) which find solutions for the snake at progressively finer and finer resolution scales. The results of such research will undoubtedly be of importance here.
A domain may differ in that the value function learned might not produce features locatable by the snake with the present parameter settings. The values of the parameters were empirically determined, using hand crafted examples from the robot navigation and the robot arm domains. The obvious danger is that the parameters might be tuned to these examples. To demonstrate that this is not the case, configurations for the experiments in the robot navigation domain were generated randomly. As configurations for the robot arm domain are more tightly constrained, the hand crafted examples were used in the experiments. Nevertheless, the experiments have shown that the parameters worked successfully for random examples in the robot navigation domain. The same parameters also work successfully in the second domain, the robot arm. The following discussion demonstrates that they are also reasonably effective in a quite different domain, the ``car on the hill''. It is anticipated that using the results of current research into snakes will automate the selection of many parameters.
In the ``car on the hill''domain (Moore 1992), the task, simply stated, is to get a car up a steep hill, Figure 38. If the car is stationary part way up the hill, in fact anywhere within the dotted line, then it has insufficient acceleration to make it to the top. So the car must reverse down the hill and then achieve sufficient forward velocity, by accelerating down the other side, before accelerating up the hill. The state space, for the purposes of reinforcement learning, is defined by two dimensions. These are the position and velocity of the car, as shown in Figure 39. The goal is to reach the top of the hill with a small positive or negative velocity. In this domain there are two possible actions: accelerate forward, accelerate backwards. Unlike in previous domains, there is no clear mapping of the actions onto the state space. The state achieved on applying an action is determined by Newton's laws of motion. As the car has insufficient acceleration to make it up the hill from everywhere in the state space, a ``wall'' is effectively introduced, the bold line in Figure 39. To reach the top of the hill, the car must follow a trajectory around this ``wall'', the dashed line in Figure 39.
Figure 40 shows the reinforcement learning function. It exhibits the same steep gradient as the other domains. The important point to note is that, unlike in the other domains, no physical object causes this gradient. It is implicit in the problem itself, yet the features still exist. Figure 41 shows the partition produced when applying the snake to the ``car on the hill'' domain. The main difference from the previous examples is that the polygonal constraint has not been used. When the snake initially comes to rest, the mercury force is turned off and then the snake is allowed to find the minimum energy state. It was also necessary to reduce the scaling of the edges, by about a factor of three quarters, to achieve the accuracy of fit. The fit around the top left corner of the second snake, the dashed line, also has some problems: the snake is growing very slowly downwards and is, at present, only stopped because it has reached the maximum number of iterations allowed. One difficulty in this example is that there is not such clear delimitation of the upper and lower regions at the end of the feature. Future work will investigate altering the stopping condition to eliminate this problem.
A domain may differ in that the shape of various regions in the partition is more complex than can be dealt with by the present snake. Fitting the snake to the task discussed in the previous paragraphs goes some way towards mitigating that concern. Nevertheless, the randomly generated examples of Section 4.1 were subject to certain constraints. Configurations with narrower rooms were tried informally, but the snake did not reliably locate the features. The configurations in Section 4 represent the limit of the complexity of partition the snake can produce at present. It is expected that using ideas from the large body of already published research into snakes will go a long way towards addressing this limitation. For complex regions, locating all the subtleties of the underlying shape may be unnecessary, or even undesirable. The aim is to speed up low level learning. As long as the solution is reasonably accurate, speed up should be obtained. Being too sensitive to minor variations in shape may severely limit the opportunities for transfer and thus reduce speed up overall.
A domain may differ in that changes in the environment are more complex than those investigated in this paper. At present, the system detects that the goal has moved by counting how often a reward is received at the old goal position. Not only is this a rather ad hoc approach, but it also does not account for other possible changes, such as paths becoming blocked or short-cuts becoming available. At present, when learning a new task the system is restarted and is not required to determine that its present solution is no longer applicable. In future work, the system should decide when its model of the world is no longer correct. It should also decide what, if any, relationship there is to the existing task and how it might be best exploited. This will allow a more complex interaction of the function composition system with reinforcement learning. For instance, the learning of a new task for the robot navigation domain used the relatively simple situation of two rooms. The function composition system initialized the low level algorithm once on detecting suitable features. In the future, to address more complex tasks, with many more rooms, an incremental approach will be used. When a new task is being learned, the system will progressively build up a solution by function composition as different features become apparent.
This approach also should handle any errors the system might make with feature extraction. In the experiments with these simple room configurations, the filtering discussed in Section 2.3 proved sufficient to prevent problems. But in more complex tasks, it is likely that false ``doorways'' will be detected, simply because the system has not explored that region of the state space. A composed function including that extra doorway will drive the system into that region. It should then become quickly apparent that the doorway does not exist and a new function can be composed.