------- Included File ------- I'm going to try to give two 30 minute chats at this week's RL talk. Here are the abstracts, but absolutely no gaurantee that I'll manage to get to the second talk before the hour is up. Andrew Preventing "Overfitting" of Cross-Validation Data Suppose that, for a learning task, we have to select one hypothesis out of a set of hypotheses. A common approach is to evaluate each hypothesis in the set on some previously unseen cross-validation data, and then to select the hypothesis that had the lowest cross-validation error . But when the cross-validation data is partially corrupted such as by noise, and if the set of hypotheses we are selecting from is large, then "folklore" also warns about "overfitting" the cross-validation data. In this talk, I will explain how this "overfitting" really occurs, and show the surprising result that it can be overcome by selecting a hypothesis with a {\em higher} cross-validation error, over others with lower cross-validation errors. We give reasons for not selecting the hypothesis with the lowest cross-validation error, and propose a new algorithm, LOOCVCV, that uses a computationally efficient form of leave--one--out cross-validation to select such a hypothesis. Finally, I will show experimental results for one domain, that show LOOCVCV consistently beating picking the hypothesis with the lowest cross-validation error, even when using reasonably large cross-validation sets. Policy iteration for Reinforcement Learning with function approximation In reinforcement learning, value iteration-based algorithms for learning a value function have been shown to converge when the value function is represented by a lookup table, but not with general function approximators such as multilayer perceptrons. This was not encouraging because grid-based discretization methods suffer badly from the curse of dimensionality. But more recently, algorithms based on contraction mappings, and the gradient-descent based Residual Algorithm have also been shown to converge using function approximators. In this talk, I will derive a new algorithm that, similar to Residual Algorithms, minimizes the Bellman Residual at each step (in a sense to be explained), and without having to resort to a gradient descent process. This very naturally leads to a new and very natural approach to reinforcement learning with function approximators, that nicely falls into the framework of policy iteration, and which does not suffer from the divergence problems of many the earlier algorithms.