next up previous
Next: Finite-history-window Approach Up: Policies with Internal State Previous: Recurrent Q-learning

Classifier Systems

Classifier systems [47, 41] were explicitly developed to solve problems with delayed reward, including those requiring short-term memory. The internal mechanism typically used to pass reward back through chains of decisions, called the bucket brigade algorithm, bears a close resemblance to Q-learning. In spite of some early successes, the original design does not appear to handle partially observed environments robustly.

Recently, this approach has been reexamined using insights from the reinforcement-learning literature, with some success. Dorigo did a comparative study of Q-learning and classifier systems [36]. Cliff and Ross [26] start with Wilson's zeroth-level classifier system [135] and add one and two-bit memory registers. They find that, although their system can learn to use short-term memory registers effectively, the approach is unlikely to scale to more complex environments.

Dorigo and Colombetti applied classifier systems to a moderately complex problem of learning robot behavior from immediate reinforcement [38, 37].

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996