Next: 3.2.1 Three Extensions to Up: 3 Details of the Previous: 3.1 Reinforcement Learning

## 3.2 Feature Extraction

Feature extraction uses a vision processing technique that fits a deformable model called a snake (Kass, Witkin, & Terzopoulus 1987) to edges in an image. After initializing the snake, the process iterates until external forces, due to the edges, balance internal forces in the snake that promote a smooth shape. Here, the external forces are due to steep gradients in the value function. As a piecewise constant function approximator is used, a smoothed cubic b-spline is fitted to the value-function and used to generate the necessary derivatives. The left hand side of Figure 14 is the gradient of the value function shown in Figure 9 when extracting features early in the learning process. The system has added a gradient around the border to represent the state space boundary.

Figure 14: The Gradient and Resultant Polygon (Left) Extracted by the Snake (Right)

To locate the features, a curve is found that lies along the ridge of the hills, a local maximum in the differential. On the right hand side of Figure 14, the dashed lines are contour lines for the small inner room as indicated. The bold lines, on the right hand side of Figure 14, are the snake at different stages of the process. The snake is first positioned approximately in the center of the room, the innermost circle. It is then expanded until it abuts on the base of the hills. Now to simplify the exposition, we can imagine that the snake consists of a number of individual hill climbers spread out along the line representing the snake, indicated by the small white circles. But instead of being allowed to climb independently, their movement relative to each other is constrained to maintain a smooth shape. When the snake reaches the top of the ridge, it is further constrained to be polygon - in this instance a quadrilateral - the outside dark line in Figure 14. At this point, it will tend to oscillate around an equilibrium position. By limiting the step size the process can be brought into a stationary state. A more detailed mathematical treatment of this approach is given in Appendix A.

The polygon forms the ``skeleton'' for the graph, as shown at the top left of Figure 14. Nodes in a graph correspond to vertices of the polygon and to the doorways and the goal. Looking at the gradient plot, the doorways are regions with a small differential between the ridges. Their locations can be determined from the magnitude of the gradient along the boundary of the polygon. In this example, a node is added for the goal (labeled G) and this is connected to the ``in'' doorway (labeled I). The polygon delimits a region of the state space, and therefore a region of the action-value function. This becomes a case in the case base, and the corresponding graph its index. Constraining the snake to be a polygon is done for two reasons. Firstly, the vertices are needed to produce nodes in the plane graphs, which are important part of the matching process. Secondly, the additional constraint results in a more accurate fit to the boundaries of the subtask. This, in turn, results in a more accurate solution after function composition.

Next: 3.2.1 Three Extensions to Up: 3 Details of the Previous: 3.1 Reinforcement Learning

Chris Drummond
Thursday January 31 01:30:31 EST 2002