The division between knowledge-based and behavior-based artificial intelligence has been fundamental to achieving successful applications within the field of autonomous robots [Arkin, 1998]. However, up to now, this division has had few repercussions for reinforcement learning. Within artificial intelligence, reinforcement learning has been formalized in a very general way borrowing ideas from the dynamic programming and decision-theory fields. Within this formalization, the objective of reinforcement-learning methods is to establish a correct mapping from a set of abstract observations (formalized as states) to a set of high level actions, without being worried about how these sets of states and actions are defined (see [Kaelbling et al., 1996,Sutton and Barto, 1998], among many others). Algorithms developed within this general framework can be used in different fields without any modification. For each particular application, the definition of the sets of states and actions is the responsibility of the programmer and is not supposed to be part of the reinforcement-learning problem. However, as clearly pointed by [Brooks, 1991], in autonomous robots the major hurdles are those related with perception and action representations. For this reason, in a robotic task, what traditional reinforcement-learning research assumes to be the major problem (connecting states and actions) is simpler than what it assumes as given (the definition of states and actions). The consequence is that existing reinforcement-learning methods are best suited for problems that fall into the symbolic artificial intelligence domain than for those that belong to robotics. Due to the generality of existing reinforcement-learning algorithms, a robotic problem can be analyzed and re-formulated so that it can be tackled with the available reinforcement-learning tools but, in many cases, this re-formulation is too awkward introducing unnecessary complexity in the learning process. The alternative we explore in this paper is a new reinforcement-learning algorithm that can be applied to robotic problems as they are, without any re-formulation.
As [Brooks, 1991] remarked, dealing with a real environment is not necessarily a problem since real environments have properties that can be exploited to reduce the complexity of the robot's controller. In Brooks' works, we can find simple robot controllers that achieve very good performance in particular environments. This is clearly in contrast with the generality pursued within reinforcement learning. Following an idea parallel to that of Brooks, in this paper, we present a new reinforcement-learning algorithm that takes advantage of a specific environment-related property (that we call categorizability) to efficiently learn to achieve a given task. We formalize the categorizability property and we present a representation system (partial rules) to exploit this property. A remarkable feature of this representation system is that it allows generalization in both the spaces of sensors and actions, using a uniform mechanism. This ability to generalize in both the state and action spaces is fundamental to successfully apply reinforcement learning to autonomous robots.
This paper is organized as follows. First, in Section 2, we formalize reinforcement learning from the point of view of its use in the field of autonomous robotics and we describe the problems that make flat (and, in most cases, also generalization-based) reinforcement-learning algorithms not adequate for this case. Section 3 presents the categorizability assumption which is plausible in most robotics environments. Then, in Section 4, we describe an alternative reinforcement-learning algorithm that exploits the categorizability assumption to circumvent the problems present in existing approaches. In Section 5, we analyze the points of contact between our proposal and already existing work. Next, in Section 6, we present experiments that validate our approach. The experiments are performed in simulations that mimic realistic robotic applications where the categorizability assumption is likely to be valid. Finally, in Section 7, we conclude by analyzing the strengths and weaknesses of the proposed learning system.
Additionally, Appendix A provides a detailed description of the partial-rule learning algorithm introduced in this paper, Appendix B is devoted to an enhancement on this algorithm to make its execution more efficient, and Appendix C summarizes the notation we use throughout the paper.
Josep M Porta 2005-02-17