Explore or Exploit? Build a Model or not?:
Results from a Chernoff-bounds based analysis
of Reinforcement Learning Algorithms
Satinder Singh (*)
AT&T Labs-Research
Much of the analysis of reinforcement learning (RL) algorithms
has relied on tools from stochastic approximation theory to prove
asymptotic convergence results for a number of popular RL
algorithms. These results were crucial in gaining credibility for a
young field of research and has, at least in part, led to the
increasing use of RL algorithms by those interested in solving real RL
problems. Nevertheless, many basic questions about the computational
aspects of popular RL algorithms remained open.
In the recent past, methods familiar to computational learning
theorists, such as the use of sampling and Chernoff bounds, have begun
to be applied to the RL problem. In this talk I will present some
results obtained from this new kind of analysis of RL problems: How
difficult is it to solve general RL problems?, How can one solve the
exploration-exploitation dilemma?, Is it better to learn a model from
the data and use a model-based RL method, or is it better to use a
model-free RL method?
* This is joint work with Michael Kearns.