12:00, Wed 10 December 1997, WeH 7220 Title: KnightCap: a chess program that learns by combining TD($\lambda$) with minimax search. Jonathan Baxter Department of Systems Engineering Research School of Information Science and Engineering, Australian National University Abstract: One of the most famous successes of Samuel and Sutton's Temporal-Difference approach to Reinforcement learning is Tesauro's TDGammon---a neural network that has been trained to near-world-champion standard using only TD($\lambda$) and self-play. In this talk I will present TDLeaf($\lambda$), a variation on the TD($\lambda$) algorithm that enables it to be used in conjunction with minimax search. Some experiments will be presented in which ``KnightCap,'' a chess program of our own devising, used TDLeaf($\lambda$) to learn its evaluation function while playing on the Free Internet Chess Server (FICS, {\tt fics.onenet.net}). The main result we report is that KnightCap improved from a 1650 rating to a 2100 rating in just 308 games and 3 days of play. I will discuss some of the reasons for this success and also the relationship between our results for chess and Tesauro's results in backgammon with TDGammon.