Exploiting Tree Search in Model Learning Reinforcement Learning Algorithms Scott Davies , Andrew Ng , Andrew Moore We do model-based rl This can be useful if data is expensive But the computational cost of re-solving as the model improves can be disastrous. What can we do about that? Tricks like prioritized sweeping may be useful but aren't the full answer. Here, we investigate the use of tree search in continuous spaces. We look at Microsearch Uninformed Macrosearch (= BFS) Informed Macrosearch (= A-star + approx value function) The possible benefits are * Micro-search is much more robust in the face of an approximate value function * Macro search can be better than DP-on-a-grid for finding continous trajectories we can trust, and for not haullucinating solutions where none exist * Macro search with A* may allow us to exploit a very approx value function to get a good strategy An important issue when learning a model is exploration. We combine Macrosearch with "optimism-in-the-face-of-uncertainty" type exploration. We also discuss how to get generalization and extremely fast function-approximator lookup at the same time. Our experimental results will attempt to find answers to the following questions...