Conclusions

In this paper, we have introduced the *categorizability assumption* that states that a robot can be
driven to achieve a given task using only simple rules: i.e., rules including a
reduced set of feature detectors and elementary actions. This assumption is supported by our
experience within the behavior-based approach where controllers are formed by sets of rules
with relatively simple conditions and actions. We have shown that a learning algorithm based on
the categorizability assumption allows a large speed up in the learning process in
many realistic robotic applications with respect to existing algorithms.

To exploit the categorizability assumption both in the observations and action spaces,
we have introduced a new representation formalism based on
the concept of *partial rules* and not on the concepts of independent states and
independent actions that are the kernel of many existing reinforcement-learning
approaches.

The introduction of the *partial-rule* concept provides a large flexibility on how
problems are formalized. With the same structure and algorithms, we can confront problems with
generalization in the perception side (usually considered in
reinforcement learning), in the action side (usually not considered), or in both of them.

When no
generalization is possible at all via partial rules, we have to use *complete rules*: rules
involving all the available inputs and outputs.
In this case, the partial-rule approach is equivalent to the
non-generalizing reinforcement learning. The algorithm we have presented can, if
necessary, generate *complete rules* and, consequently, it can, in principle, solve any
problem
that can be solved
using a traditional reinforcement-learning algorithm. However, we take
the categorizability assumption as valid and so, the generation of complete rules is an
extreme case that is only likely to occur in a very limit situation. Therefore, in our approach,
we forego generality
in order to increase the efficiently of the learning process in the class of problems we
want to address.

Another advantage of the partial-rule framework is that it allows the easy and robust introduction of initial knowledge in the learning process in the form of rules that are easily understood by the programmer. This is in contrast with usual reinforcement-learning algorithms where the introduction of initial knowledge is, in general, rather difficult.

In the partial-rule approach, there is a subtle change of emphasis as to the main goal of learning: While in most work in reinforcement learning the emphasis is on learning the value of each action in each state, our main purpose is to learn the relevance of (subsets of) elementary actions and feature detectors. If the relevant subsets of elementary actions and feature detectors are identified, the learning becomes straightforward.

The main limitation of our work is that
it is not possible to know a priori (except for trivial cases) whether or not an
environment is categorizable by a given robot. Non-generalizing reinforcement learning implicitly
assumes that the environment is non-categorizable and that, consequently, all the
possible combination of features and actions have to be taken into account separately. Our
approach assumes just the opposite: that the environment is categorizable and, so, only reduced
combinations of features and actions need to be taken into account. The drawback of using the
non-generalizing approach is that robotic tasks become intractable because of the
*curse of dimensionality*. With generalization techniques this problem can be partially
alleviated, but not enough in general. In our approach we take a more
radical approach in order to be much less affected by the curse of dimensionality:
we introduce a strong bias in the learning process
to drastically limit the use of combinations of features
and actions.

We have tested the partial-rule learning algorithm in many robotic-inspired problems and two of them have been discussed in this paper (landmark based-navigation and six-legged robot gait generation) and the categorizability assumption proved to be valid in all cases we tested. The algorithm out-performs generalizing and non-generalizing reinforcement-learning algorithms in both memory requirements and convergence time. Additionally, we have shown that our approach scales well when the number of inputs increases, while the performance of existing algorithms is largely degraded. This is a very important result that lets us think that it could be possible to use our approach to control more complex robots, while the use of existing approaches has to be discarded.

From the work presented in this paper, we can extract two main proposals. First, to apply reinforcement learning to agents with many sensors and actuators, we should concentrate our efforts in determining the relevance of inputs and outputs and, second, to achieve efficient learning in complex environments it could be necessary to introduce additional assumptions into the reinforcement-learning algorithms, even at the risk of losing generality.

Josep M Porta 2005-02-17