12:00, Wed 28 Feb 1996, WeH 7220
Scaling Issues in Reinforcement Learning
Leemon Baird
This talk will address two problems that arise when scaling up
reinforcement learning to large problems: continuous time, and
function approximators. Q-learning does not work for control problems
with continuous time, or small time steps, or a discount factor near
1. The Advantage Learning algorithm works well in this case, and can
learn much faster than Q-learning. Traditional reinforcement-learning
algorithms are guaranteed to converge when lookup tables are used, but
not when general function approximators such as standard neural nets
are used. This problem is solved by a class of algorithms known as
"residual algorithms". This talk will cover these problems and
solutions, and briefly describe empirical results using a combination
of these algorithms.