next up previous
Next: POMDP Approach Up: Policies with Internal State Previous: Classifier Systems

Finite-history-window Approach

One way to restore the Markov property is to allow decisions to be based on the history of recent observations and perhaps actions. Lin and Mitchell [62] used a fixed-width finite history window to learn a pole balancing task. McCallum [76] describes the ``utile suffix memory'' which learns a variable-width window that serves simultaneously as a model of the environment and a finite-memory policy. This system has had excellent results in a very complex driving-simulation domain [74]. Ring [92] has a neural-network approach that uses a variable history window, adding history when necessary to disambiguate situations.



Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996