ROUT

Next: Task 1: Hopworld Up: Preliminary Results Previous: Preliminary Results

ROUT

I present here results with ROUT on three domains: a prediction task, a two-player dice game, and a k-armed bandit problem. For all problems, I compare ROUT's performance with that of TD( ) given the equivalent function approximator. I measure the time to reach best performance (in terms of total number of state evaluations performed) and the quality of the learned value function (in terms of Bellman residual, closeness to the true , and performance of the greedy control policy).

Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996