Here is the STAGE optimization algorithm main loop, along with two
alternative subroutines for training
, as described in
Section 3.3.
| STAGE(Algorithm A, Objective-function f(x)): |
| /* Assumption: A(v) is a graph-search algorithm which, given any evaluation function v(x), |
| acts as a Markov chain over the graph. */ |
| repeat: |
|
run A(f), producing a trajectory
|
|
Update_Fitter_from_Traj( |
|
run the two-stage optimization procedure: |
| until results stop improving. |
| Update_Fitter_From_Traj_by_TD |
| /* Assumes that function approximator V is parametrized by weight vector w. */ |
|
for i := |
|
|
update V's weights by delta rule: |
| end. |
| Update_Fitter_From_Traj_by_Batch_Fit(Fitter V, Trajectory T, result-value z): |
| /* Assumes that V stores all data it has ever been trained on. */ |
|
for i := 0 to |
|
Add training pair to V's memory: |
| Re-train V from the updated training set. |