next up previous
Next: Classifier Systems Up: Policies with Internal State Previous: Policies with Internal State

Recurrent Q-learning

One intuitively simple approach is to use a recurrent neural network to learn Q values. The network can be trained using backpropagation through time (or some other suitable technique) and learns to retain ``history features'' to predict value. This approach has been used by a number of researchers [77, 62, 103]. It seems to work effectively on simple problems, but can suffer from convergence to local optima on more complex problems.



Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996