Next: Classifier Systems
Up: Policies with Internal State
Previous: Policies with Internal State
One intuitively simple approach is to
use a recurrent neural network to learn Q values. The network can
be trained using backpropagation through time (or some other suitable
technique) and learns to retain ``history features'' to predict value.
This approach has been used by a number of
researchers [77, 62, 103]. It seems to
work effectively on simple problems, but can suffer from convergence
to local optima on more complex problems.
Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996