Monday May 17, 1993 - WeH 7220 - Geoff Gordon
Continuous Q Functions Are (Sort Of) PAC Learnable
ABSTRACT
Q-learning was originally designed for finite models, and there are
significant problems with extending it to continuously-parameterized
state spaces. (For instance, Sebastian's recent talk pointed out that
neural nets are prone to overestimating Q values.)
I will present some results (due to Haussler) on the sample complexity
of approximating real-valued functions. These results assume the
ability to determine the error of a hypothesis directly, which is
unfortunately not available to Q-learners.
I will then define an error function and motivate its use as a
substitute for direct error measurement; finally, I will show that we
can find a probably-approximately-correct Q function (with respect to
this error metric) using polynomially many samples. For certain
hypothesis spaces, we will in addition need only polynomial-time
computation.
This is definitely work in progress, and I welcome all comments,
suggestions, and criticism.