next up previous
Next: References Up: Accelerating Reinforcement Learning by Previous: 8 Conclusions

Appendix A. Spline Representations


This appendix presents some of the underlying mathematics associated with spline representations and the snake. It is not meant to be an introduction to the subject. Rather it is added for completeness to discuss certain important aspects of the system not addressed elsewhere in this paper. Knowledge of these aspects is not necessary to understand the basic principles of the approach discussed in this paper, but would be necessary if one wanted to duplicate the system. More detailed explanation is given in Drummond (1999). Some specific papers that address these ideas in much greater detail are: for splines (Terzopoulos 1986) and for snakes (Leymarie & Levine, 1993; Cohen & Cohen, 1993).

Splines are piecewise polynomials where the degree of the polynomial determines the continuity and smoothness of the function approximation. Additional smoothing constraints can be introduced by penalty terms which reduce the size of various differentials. One way then to view spline fitting is in the form of an energy functional such as Equation 6.


Here, there is an energy associated with the goodness of fit, some measure of how close the approximating function is to the input function. This is typically the least squares distance between the functions. There is an energy associated with the smoothness of the function. Two very commonly used smoothness controls produce the membrane and thin plate splines by restricting the first and second differentials of the function respectively. To fit the spline to the function, the total energy must be minimized. A necessary condition for this is an Euler-Lagrange differential equation such as Equation 7. Here tex2html_wrap_inline1860 controls the tension in the spline (the resistance to stretching) and tex2html_wrap_inline1862 the stiffness (the resistance to bending). Often the error function will be based on individual data points and the left hand side of Equation 7 would include delta functions.


In this work, such splines have been used for a number of purposes. When fitting the snake, measures of the first and second differential are needed. A two dimensional quadratic spline is fitted to the discrete representation of the maximum Q-values. An tex2html_wrap_inline1860 of 0.2 is used ( tex2html_wrap_inline1862 is zero) to limit overshoot (Drummond 1996) to prevent false edges. Values from an identical spline except using an tex2html_wrap_inline1860 of 2.0 are squared and then divided into the differential values. This normalizes the differentials, so that the size of edges is not dependent on where they occur in the function. The same type of spline is used to produce the bowls associated with the rooms as discussed in Section 3.2.1. Here tex2html_wrap_inline1860 is 1.0 and tex2html_wrap_inline1862 is 0.5 giving roughly Gaussian smoothing. The values used to produce this function are weighted. Values close to one are given weights of 200, lower values a weight of 1. This prevents the sides of the bowls from collapsing under smoothing.

A one dimensional cubic spline is used in locating the doorways. These are found by steepest descent on the value of the differential along the body of the snake. This differential contains many local minima not associated with doorways. These arise either from the inherent noise in the process or from errors of fit in the snake. The aim is to remove the ones not associated with doorways by smoothing and thresholding. This is achieved by first sampling the gradient at points along the snake. The values are then normalized to lie between zero and one. The spline has an tex2html_wrap_inline1860 of 0.15 ( tex2html_wrap_inline1862 of 0.0). Here a weighted least mean squares fit is used. The weighting function is the inverse square of the values, preventing the spline from being overwhelmed by large values. Starting points for steepest descent are changes in the sign of the coefficients of the gradient of the spline. The initial step size is set to slightly larger than a knot spacing and then decreased over time. When a local minimum is found if the value exceeds a threshold (of 0.5), it is rejected.

To represent the snake, the model of the spline must be changed somewhat. The snake itself is a one dimensional cubic spline. But the energy minimum that is being sought is in the differential of the tex2html_wrap_inline1878 function, subject to other constraints. The dynamics of the snake are defined by the Euler-Langrange equation shown in Equation 8.


An tex2html_wrap_inline1880 of 512 minimizes changes to the snake's shape as it grows, by penalizing the difference in the second differential to the previous time step scaled by the ratio of their lengths. An tex2html_wrap_inline1862 of 8.0 is the initial stiffness of the snake. This is reduced proportionately to the snake's length to give the spline more degrees of freedom. A tex2html_wrap_inline1884 of 96 and a tex2html_wrap_inline1762 of 96 control the momentum and the drag on the snake respectively. As in Cohen and Cohen (1993), a factor is added to the energy associated with the differential that is in the direction normal to the body of the snake, as shown in Equation 9. But instead of it being a constant, a variable is used to produce the mercury model discussed in Section 3.2.1.


The energy minimization process is carried out iteratively interleaving steps for the x and y directions. The differential of tex2html_wrap_inline1892 for the x direction is given by Equation 10, a similar equation is used for the y direction.


The snake grows under the forces of the mercury model until it reaches an approximately stable position, subject only to small oscillations. It is then converted into a polygon by finding the corners (where the normal passes through tex2html_wrap_inline1896 where tex2html_wrap_inline1898 ). The coefficient tex2html_wrap_inline1900 is set to zero everywhere. The coefficient tex2html_wrap_inline1902 is set to zero at the corners and 15 between them. This produces a polygon which is flexible at its vertices.

To detect the features as early as possible in the learning process, as discussed in Section 2.4, the height of the gradient is scaled according to the signal to noise ratio. The noise arises from variations in the low level learning process and the stochastic nature of the task. Both the size of the features and the noise grow with time and are somewhat normalized by this scaling process. The idea is to collect uniformly sampled values of the function shown in Equation 10 for both the x and y directions and find the median of their absolute values. The median is not strongly affected by extreme values and thus largely ignores the size of the features, measuring only the noise of the regions in between.

next up previous
Next: References Up: Accelerating Reinforcement Learning by Previous: 8 Conclusions

Chris Drummond
Thursday January 31 01:30:31 EST 2002