next up previous contents
Next: Task 1: Hopworld Up: Preliminary Results Previous: Preliminary Results



I present here results with ROUT on three domains: a prediction task, a two-player dice game, and a k-armed bandit problem. For all problems, I compare ROUT's performance with that of TD( tex2html_wrap_inline2400 ) given the equivalent function approximator. I measure the time to reach best performance (in terms of total number of state evaluations performed) and the quality of the learned value function (in terms of Bellman residual, closeness to the true tex2html_wrap_inline1400 , and performance of the greedy control policy).

Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996