The categorizability assumption is closely related with complexity theory principles such as the Minimum Description Length (MDL) that has been used by authors such as [Schmidhuber, 2002] to bias learning algorithms. All these complexity results try to formalize the well-known ``Occam's Razor'' principle that enforces choosing the simplest model from a set of otherwise equivalent models.
[Boutilier et al., 1999] presents a good review on representation methods to reduce the computational complexity of planning algorithms by exploiting the particular characteristics of a given environment. The representation based on partial rules can be seen as another of these representation systems. However, the partial rule is just a representation formalism that, without the bias introduced by the categorizability assumption, would not be efficient enough to be applied to realistic applications.
The partial-rule formalism can be seen as a generalization of that of the XCS classifier systems described by [Wilson, 1995]. This XCS learning system aims at determining a set of classifiers (that are combinations of features with an associated action) with their associated value and relevance predictions. The main difference between this approach and ours is that Wilson's work pursues a generic learner and we bias the learning process using the categorizability assumption. This allows us to use an incremental rule-generation strategy that is likely to be more efficient in robotic problems. Additionally, the categorizability assumption also modifies the way in which the value for a given action is evaluated: Wilson's approach uses a weighted sum of the predictions of the classifier advocating for each action to determine the expected effect of that action, while, to fulfill with the categorizability assumption (i.e., to minimize the number of feature detectors and elementary actions involved in a given evaluation), we propose to use a winner-takes-all strategy. This is a critical point since the winner-takes-all strategy takes full advantage of the categorizability assumption and because it allows the partial-rule system to asymptotically converge to a table-based reinforcement-learning system. This is not the case when a weighted sum strategy is used. Furthermore, in the XCS formalism there is no generalization in the action space and, as already commented, this is a requirement in robotic-like applications.
In general, reinforcement learning does not pay attention to the necessity of generalizing in the space of actions, although some exceptions exists. For instance, the work of [Maes and Brooks, 1990] includes the possible execution of elementary actions in parallel. However this system does not include any mechanism detecting interactions between actions and, thus, the coordination of actions relies on sensory conditions. For instance, this system has difficulties detecting that the execution of two actions results always (i.e., independently of the active/inactive feature detectors) in positive/negative reward.
The CASCADE algorithm by [Kaelbling, 1993] learns each bit of a complex action separately. This algorithm presents a clear sequential structure where the learning of a given action bit depends on all the previously learned ones. In our approach there is not a predefined order in the learning of the outputs and the result is a more flexible learning schema.
In multiagent learning [Claus and Boutilier, 1998,Sen, 1994,Tan, 1997] the objective is to learn an optimal behavior for a group of agents trying to cooperatively solve a given task. Thus, in this field, as in our case, multiple actions issued in parallel have to be considered. However, one of the main issues in multiagent learning, the coordination between the different learners is irrelevant in our case since we only have one learner.
Finally, the way in which we define complex actions from elementary actions has some points in common with the works in reinforcement learning where macro-actions are defined as the learner confronts different tasks [Sutton et al., 1999,Drummond, 2002]. However, the useful combinations of elementary actions detected by our algorithm are only guaranteed to be relevant for the task at hand (although they are likely to be also relevant for related tasks).
Josep M Porta 2005-02-17