Wed 23 Nov 1994, 12:00, WeH 1327
Self-Confidence Increasing Q-learning
Ping Zhang (University of Compiegne, France)
Reinforcement learning has been shown to be a promising approach for
learning to perform complex tasks without detailed supervision.
Q-learning proposed by Watkins (1989), a very important reinforcement
learning method, provides a simple way for agents to learn to act
optimally in controlled Markovian domains by experiencing the
consequences of actions.
As a human learns to make decisions, for lack of self-confidence, he
may take a conservative rule in the early stages of learning. With the
self-confidence increasing, he gradually takes an active rule in the
later stage of learning. We propose a method to simulate this idea for
improving conventional Q-learning. We call this method
"Self-Confidence Increasing Q-learning", or SCIQ.
The SCIQ algorithm is a generalization of the conventional Q-learning
involving the representation of the believing on each state
evaluation. In order to have an efficient implementation, we propose
some simplifications of the basic mechanism involving only one
additional parameter. Simulation results show that SCIQ simulates more
finely the Markovian decision problem and then improves conventional
Q-learning. Comparing with other improvements to Q-learning, thanks to
the simplification, SCIQ does not need more memory and does not
increase computing complexity. The results presented here have some
limitations. The state and action spaces are small and SCIQ uses one
more parameter than Q-learning. Although the conjecture about
convergence of SCIQ may be accepted when SCIQ tends to Q-learning, the
convergence proof is not given yet. In future work, we will try to
prove the convergence of SCIQ and apply SCIQ to real scale problems,
such as game and robot path finding.