This section discusses the matching process - how a subgraph is used to locate and transform a function from the case base. The matching process first finds all subgraphs in the case base isomorphic to the extracted subgraph and all possible isomorphic mappings between their nodes, using a labeling algorithm (MacDonald 1992). The number of isomorphic mappings is potentially exponential in the number of nodes. Here, the graphs typically have only a few nodes and a few symmetries, so only a few isomorphic mappings. Associated with each node of a subgraph is an (x,y) coordinate. An affine transform, Equation 2, is found that minimizes the distances between the coordinates of the mapped nodes for each of the isomorphic subgraphs. The advantage of this transform is its relative flexibility while having a simple form.
Ideally the transformed nodes would be positioned exactly over the mapped nodes, but this is not usually possible. Even with simple rectangular shapes, the case base may not contain a graph with exactly the same doorway positions. Using a graph that is not an exact match will introduce some error in the composed function for the new task. By weighting some nodes more than others where the error occurs can be controlled. One aim is to minimize the introduction of errors that affect the overall path length. However, of equal importance is that the errors introduced be easily correctable by normal reinforcement learning.
The left hand side of Figure 20 shows the composite graph for the new task. The right hand side shows the result of overlaying it with a graph from the case base. If the fit at the doorway of the outer L-shaped room is in error, the robot will tend to miss the doorway and collide with the wall on one side. The farther the doorway is out of position, the longer normal reinforcement learning will take to correct the error. To encourage a good fit at the doorway, a weight of 4 is used. Nodes adjacent to the doorway are given a weight of 2, all other nodes have a weight of one. This is based on the intuition that more trajectories, from different parts of the state space, will be pass through the region close to the doorway. Any error here is likely to have a broader effect, and take longer for normal reinforcement learning to correct, than in regions far from the doorway. So the fit around the inner room is improved by sacrificing fit far from the doorway.
The exact position of the doorway in the inner room is not critical and its weight is set to 0.5. Whatever the position of the doorway, the shape of the function will be correct inside the room as the goal is also in this room. However, the further the doorway is from its correct position, the greater the error in the edge length. This will produce some error in the composed function, but again the expectation is that this error will be small and reinforcement learning will quickly correct it.
Not only should the fit be good, but we would also prefer that the amount of transformation be small. All transforms produce some error and this is particularly true of asymmetric scaling, as discussed later in this section. Generally the transform produces translation, reflection, rotation, shearing and independent scaling in each dimension. In the robot navigation domain, the distance between points in the state space is just the normal Euclidean distance. The reinforcement learning function is an exponential decay in the distance to goal. If the transformation does not change the Euclidean distance, the transformed function should be directly applicable.
The affine transformation is just one family in a hierarchy of transformations. At the bottom of this hierarchy, shown in Equation 3, are the symmetric transformations. These solid body transformations do not change the Euclidean distance. The next step up in the hierarchy introduces scaling, equal in each dimension. This will affect the Euclidean distance but only by a multiplicative factor. Thus the only change needed to the transformed function is to scale the height. The affine transformations allow the addition of asymmetric scaling and shear, which will distort the Euclidean distance. To determine the amount of distortion, the transformation is applied to the unit circle. The symmetric, rigid body, transformations will not alter the circle, but the other transformations will. The symmetric scaling transform just changes the diameter of the circle. The asymmetric scaling and shear transformations change the circle into the ellipse. The amount of distortion of the Euclidean distance introduced by the transform can be determined by the ratio of lengths of the major and minor axes of the ellipse.
The error of fit of the transformed subgraph can be combined with the transformation error using the lengths of the major and minor axes, and respectively, of the ellipse. There is a penalty for Euclidean Distortion from asymmetric scaling and shear. The log factor is added directly to the error of fit as shown in Equation 4. Log factors are used, so that the penalty functions are symmetric. There is a small penalty for symmetric scaling. Once the best matching subgraph has been found, the same transformation can be applied to the associated function. If no isomorphic graph is found with a total error less than 1.5, a constant function will be used as a default. Where the new graph overlays the old graph, values are assigned by using bilinear interpolation on the discrete values of the function. Where it does not, bilinear extrapolation is used, based on the closest values. In both cases once the four values are selected, the value for the new point is calculated as shown in Equation 5. As the action-value function is indexed by action as well as state, this process is carried out for each action in turn. Any rotation or reflection in the transform is also applied to a predefined matrix of actions. This produces the necessary mapping of actions from the original to the new action-value function.
Finally, the height of the new action-value function must be adjusted to account for the change in overall distance to goal. The height of the value function at an ``out'' doorway is where dg is the distance to goal and the discount factor. The value at some random point within the room is where dd is the distance to the doorway. The action-value function is first normalized by dividing by , the height of the function at the doorway in the original problem. It is then multiplied by , where dng is the distance to the new goal; the value of the point becomes . Scaling will also affect the height of the function. Assuming the scaling is symmetric then the new value function for anywhere in the room will be where c is the scale factor. Thus raising the function to the power of c i.e. will account for scaling. When scaling is symmetric the result is exact, assuming distance is based on the linear combination of the two dimensions. With asymmetric scaling, the result is not exact. But if the difference between the two scale factors is relatively small, it is a useful approximation to use their maximum.
The following is an algorithmic summary of the whole matching process: