Monday May 17, 1993  -  WeH 7220  -  Geoff Gordon


	     Continuous Q Functions Are (Sort Of) PAC Learnable

				ABSTRACT

Q-learning was originally designed for finite models, and there are
significant problems with extending it to continuously-parameterized
state spaces. (For instance, Sebastian's recent talk pointed out that
neural nets are prone to overestimating Q values.)

I will present some results (due to Haussler) on the sample complexity
of approximating real-valued functions. These results assume the
ability to determine the error of a hypothesis directly, which is
unfortunately not available to Q-learners.

I will then define an error function and motivate its use as a
substitute for direct error measurement; finally, I will show that we
can find a probably-approximately-correct Q function (with respect to
this error metric) using polynomially many samples. For certain
hypothesis spaces, we will in addition need only polynomial-time
computation.

This is definitely work in progress, and I welcome all comments,
suggestions, and criticism.