next up previous contents
Next: ROUT Up: Algorithms Previous: Algorithms

TD( tex2html_wrap_inline2400 ) for Control

 

This implementation of TD( tex2html_wrap_inline2400 ) is trajectory-based. For a version of TD( tex2html_wrap_inline2400 ) that performs updates after each move, refer to [Sutton1987].

TD( tex2html_wrap_inline2400 , start states tex2html_wrap_inline1864 , fitter F):
/* Assumes known world model MDP; F is parametrized by weight vector w. */
repeat steps 1 and 2 forever:
Using the model and the current evaluation function F, generate a mostly-greedy
trajectory from a start state to a terminal state: tex2html_wrap_inline1874 .
Also record the rewards tex2html_wrap_inline1876 received at each step.
Update the fitter from the trajectory as follows:
for i := T downto 0, do:
tex2html_wrap_inline1885
update F's weights by delta rule: tex2html_wrap_inline1882 := tex2html_wrap_inline1884 ;
end



Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996