Here are the ROUT algorithm and HUNTFRONTIERSTATE subroutine, as described in Section 2.3.
| ROUT(start states |
| /* Assumes that the world model MDP is known and acyclic. */ |
|
initialize training set |
| repeat: |
|
for each start state |
| s := HUNTFRONTIERSTATE(x, F); |
|
add |
| if (s = x), then mark start state x as ``done''. |
|
until all start states in |
| HUNTFRONTIERSTATE(state x, fit F): |
/*
|
|
for each legal action |
| repeat up to H times: |
|
generate a trajectory |
|
let y be the last state on |
|
if |
| restart procedure with HUNTFRONTIERSTATE(y, F). |
| /* reaching this point, x's subtree is deemed all self-consistent and correct! */ |
| return x. |