You are reading the hypertext version of this document.
It is also available as Postscript (923K) and compressed Postscript (178K).

Next: Contents

Learning Evaluation Functions

Justin A. Boyan

THESIS PROPOSAL

June 17, 1996

Thesis Committee
Andrew W. Moore, co-chair
Scott E. Fahlman, co-chair
Tom Mitchell
Thomas G. Dietterich, Oregon Graduate Institute

Abstract:

Evaluation functions are an essential component of practical search algorithms for optimization, planning and control. Examples of such algorithms include hillclimbing, simulated annealing, best-first search, A*, and alpha-beta. In all of these, the evaluation functions are typically built manually by domain experts, and may require considerable tweaking to work well.

I will investigate the thesis that statistical machine learning can be used to automatically generate high-quality evaluation functions for practical combinatorial problems. The data for such learning is gathered by running trajectories through the search space. The learned evaluation function may be applied either to guide further exploration of the same space, or to improve performance in new problem spaces which share similar features.

Two general families of learning algorithms apply here: reinforcement learning and meta-optimization. The reinforcement learning approach, dating back to Samuel's checkers player [Samuel1959] but with more recent successes on backgammon [Tesauro1992] and job-shop scheduling [Zhang and Dietterich1995], is based on asynchronous dynamic programming with value function approximation. The currently-popular algorithms, Q-learning and TD( ) with neural networks, run slowly and may even be unstable. I will evaluate several original value-function-approximation algorithms tailored for combinatorial optimization domains.

The meta-optimization approach, which has also been applied to game-playing [Pollack et al. 1996] and combinatorial optimization [Ochotta1994], is conceptually simpler: we assume a fixed parametric form for the evaluation function and optimize it directly. These methods, lacking the theoretical advantages of dynamic programming, have been ignored by the reinforcement-learning community; however, recent advances in local optimization [Moore and Schneider1996] may help make them superior to reinforcement learning in practice. My research will develop both methods and evaluate them empirically.

I will apply these techniques to all or most of the following large-scale combinatorial domains: VLSI channel routing; treatment planning for radiation therapy; scheduling a factory production line; configuring the subsystems of the NASA Outer Planet Orbiter; and move selection in the game of backgammon. I will also validate my new algorithms on standard problems from the optimization and control literatures. Finally, a byproduct of this research will be a software package for public use.

Next: Contents

Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996