next up previous contents
Next: Preliminary Results Up: Algorithms Previous: ROUT

STAGE

 

Here is the STAGE optimization algorithm main loop, along with two alternative subroutines for training tex2html_wrap_inline1752 , as described in Section 3.3.

STAGE(Algorithm A, Objective-function f(x)):
/* Assumption: A(v) is a graph-search algorithm which, given any evaluation function v(x),
acts as a Markov chain over the graph. */
repeat:
run A(f), producing a trajectory tex2html_wrap_inline1966 ;
Update_Fitter_from_Traj( tex2html_wrap_inline1968 );
run the two-stage optimization procedure: tex2html_wrap_inline1970 , and print result.
until results stop improving.
Update_Fitter_From_Traj_by_TD tex2html_wrap_inline2400 (Fitter V, Trajectory T, result-value z):
/* Assumes that function approximator V is parametrized by weight vector w. */
for i := tex2html_wrap_inline1986 downto 0, do:
tex2html_wrap_inline1988
update V's weights by delta rule: tex2html_wrap_inline1882 := tex2html_wrap_inline1994 ;
end.
Update_Fitter_From_Traj_by_Batch_Fit(Fitter V, Trajectory T, result-value z):
/* Assumes that V stores all data it has ever been trained on. */
for i := 0 to tex2html_wrap_inline1986 , do:
Add training pair to V's memory: tex2html_wrap_inline2010 ;
Re-train V from the updated training set.



Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996