Monday May 17, 1993 - WeH 7220 - Geoff Gordon Continuous Q Functions Are (Sort Of) PAC Learnable ABSTRACT Q-learning was originally designed for finite models, and there are significant problems with extending it to continuously-parameterized state spaces. (For instance, Sebastian's recent talk pointed out that neural nets are prone to overestimating Q values.) I will present some results (due to Haussler) on the sample complexity of approximating real-valued functions. These results assume the ability to determine the error of a hypothesis directly, which is unfortunately not available to Q-learners. I will then define an error function and motivate its use as a substitute for direct error measurement; finally, I will show that we can find a probably-approximately-correct Q function (with respect to this error metric) using polynomially many samples. For certain hypothesis spaces, we will in addition need only polynomial-time computation. This is definitely work in progress, and I welcome all comments, suggestions, and criticism.