CREATING ADVICE-TAKING REINFORCEMENT LEARNERS
by Richard Maclin and Jude W. Shavlik
Learning from reinforcements is a promising approach for creating
intelligent agents. However, reinforcement learning usually requires
a large number of training episodes. We present and evaluate a design
that addresses this shortcoming by allowing a connectionist Q-learner
to accept advice given, at any time and in a natural manner, by an
external observer. In our approach, the advice-giver watches the
learner and occasionally makes suggestions, expressed as instructions
in a simple imperative programming language. Based on techniques from
knowledge-based neural networks, we insert these programs directly
into the agent's utility function. Subsequent reinforcement learning
further integrates and refines the advice. We present empirical
evidence that investigates several aspects of our approach and show
that, given good advice, a learner can achieve statistically
significant gains in expected reward. A second experiment shows that
advice improves the expected reward regardless of the stage of
training at which it is given, while another study demonstrates that
subsequent advice can result in further gains in reward. Finally, we
present experimental results that indicate our method is more powerful
than a naive technique for making use of advice.