next up previous contents
Next: Meta-Optimization Algorithms Up: Direct Meta-Optimization of Evaluation Previous: Direct Meta-Optimization of Evaluation

Literature Review

 

As in the VFA literature, game-playing programs have provided the most common testbed for this approach. Indeed, Samuel's checkers program should be seen as a hybrid VFA/meta-optimization method. In Samuel's algorithm, after the scoring function was tweaked to more closely predict the lookahead values, the new function (``Alpha'') was tested in a head-to-head match against the old one (``Beta''). Only if Alpha won were the TD-based tweaks allowed to be kept, replacing Beta. Furthermore, if several consecutive Alpha functions lost, then the biggest coefficient was simply reset to zero--in effect, diversifying the meta-optimization search at the expense of accurate value function approximation.

Genetic-algorithm approaches to game learning generally fall into this category (e.g. [Tunstall-Pedoe1991]). Most recently, Pollack et. al. attacked backgammon with a simple hillclimbing approach [Pollack et al. 1996]. Like Samuels, they used Alpha (challenger) and Beta (champion) evaluation functions. Their optimization scheme was purposely simplistic: generate a new Alpha by randomly tweaking Beta's coefficients (the 3980 weights of a neural network), and if Alpha beats Beta in a match, move Beta's coefficients uniformly 5% toward Alpha's. Surprisingly, this procedure developed an excellent backgammon player!

Meta-optimization has also been applied successfully to aid combinatorial optimization. Ochotta's simulated annealing system for synthesizing analog circuit cells made use of a sophisticated cost function parametrized by 46 real numbers [Ochotta1994]. These and 10 other parameters of the annealer were optimized using Powell's method as described in [Press et al. 1992]. Each parameter setting was evaluated by summing the mean, median and minimum final results of 200 annealing runs on a small representative problem instance. After several months of real time and four years of CPU time (!), Powell's method produced an evaluation function which performed well and generalized robustly to larger instances.



Justin A. Boyan
Sat Jun 22 20:49:48 EDT 1996