Friday Feb 3, 12:30, WeH 1327
Learning Approximate But Useful Value Functions
Matthew McDonald, Dept. Computer Science, University of Western Australia
It's widely believed that Reinforcement Learning methods must be
combined with generalising function approximators, in order to scale
to the extremely large state spaces common in AI. However, the use of
function approximators can introduce approximation error, and although
some applications of such combinations have been extremely successful,
not all results have been as encouraging. This talk will examine the
effect of approximation errors on the behaviour of RL agents
performing episodic tasks in deterministic environments, and suggest:
(1) that significant approximation errors are generally unavoidable,
(2) that their effects are potentially severe, and
(3) situations where problems are likely to occur in practice.
Constraints on value functions that guarantee useful, although
possibly sub-optimal, behaviour in these tasks will be discussed.
Results will be presented for a method based on these constraints that
demonstrate it's able to exploit their error-tolerance to construct
evaluation functions that can be stored in forms that scale well as
the size of a problem increases.
I'll conclude by discussing open problems.