------- Included File -------
I'm going to try to give two 30 minute chats at this week's RL talk.
Here are the abstracts, but absolutely no gaurantee that I'll manage
to get to the second talk before the hour is up.
Andrew
Preventing "Overfitting" of Cross-Validation Data
Suppose that, for a learning task, we have to select one hypothesis
out of a set of hypotheses. A common approach is to evaluate each
hypothesis in the set on some previously unseen cross-validation data,
and then to select the hypothesis that had the lowest cross-validation error
. But when the cross-validation data is partially corrupted such as
by noise, and if the set of hypotheses we are selecting from is large,
then "folklore" also warns about "overfitting" the cross-validation
data. In this talk, I will explain how this "overfitting" really occurs,
and show the surprising result that it can be overcome by selecting a
hypothesis with a {\em higher} cross-validation error, over others with
lower cross-validation errors. We give reasons for not selecting the
hypothesis with the lowest cross-validation error, and propose a new
algorithm, LOOCVCV, that uses a computationally efficient form of
leave--one--out cross-validation to select such a hypothesis. Finally,
I will show experimental results for one domain, that show LOOCVCV
consistently beating picking the hypothesis with the lowest
cross-validation error, even when using reasonably large cross-validation
sets.
Policy iteration for Reinforcement Learning with function approximation
In reinforcement learning, value iteration-based algorithms for
learning a value function have been shown to converge when the
value function is represented by a lookup table, but not with
general function approximators such as multilayer perceptrons.
This was not encouraging because grid-based discretization methods
suffer badly from the curse of dimensionality. But more recently,
algorithms based on contraction mappings, and the gradient-descent
based Residual Algorithm have also been shown to converge using
function approximators. In this talk, I will derive a new algorithm
that, similar to Residual Algorithms, minimizes the Bellman Residual
at each step (in a sense to be explained), and without having to
resort to a gradient descent process. This very naturally leads to
a new and very natural approach to reinforcement learning with
function approximators, that nicely falls into the framework
of policy iteration, and which does not suffer from the divergence
problems of many the earlier algorithms.