Wed 23 Nov 1994, 12:00, WeH 1327 Self-Confidence Increasing Q-learning Ping Zhang (University of Compiegne, France) Reinforcement learning has been shown to be a promising approach for learning to perform complex tasks without detailed supervision. Q-learning proposed by Watkins (1989), a very important reinforcement learning method, provides a simple way for agents to learn to act optimally in controlled Markovian domains by experiencing the consequences of actions. As a human learns to make decisions, for lack of self-confidence, he may take a conservative rule in the early stages of learning. With the self-confidence increasing, he gradually takes an active rule in the later stage of learning. We propose a method to simulate this idea for improving conventional Q-learning. We call this method "Self-Confidence Increasing Q-learning", or SCIQ. The SCIQ algorithm is a generalization of the conventional Q-learning involving the representation of the believing on each state evaluation. In order to have an efficient implementation, we propose some simplifications of the basic mechanism involving only one additional parameter. Simulation results show that SCIQ simulates more finely the Markovian decision problem and then improves conventional Q-learning. Comparing with other improvements to Q-learning, thanks to the simplification, SCIQ does not need more memory and does not increase computing complexity. The results presented here have some limitations. The state and action spaces are small and SCIQ uses one more parameter than Q-learning. Although the conjecture about convergence of SCIQ may be accepted when SCIQ tends to Q-learning, the convergence proof is not given yet. In future work, we will try to prove the convergence of SCIQ and apply SCIQ to real scale problems, such as game and robot path finding.