next up previous
Next: More General Problems Up: Ad-Hoc Techniques Previous: Randomized Strategies

Interval-based Techniques

Exploration is often more efficient when it is based on second-order information about the certainty or variance of the estimated values of actions. Kaelbling's interval estimation algorithm [52] stores statistics for each action tex2html_wrap_inline1892 : tex2html_wrap_inline1834 is the number of successes and tex2html_wrap_inline1832 the number of trials. An action is chosen by computing the upper bound of a tex2html_wrap_inline1924 confidence interval on the success probability of each action and choosing the action with the highest upper bound. Smaller values of the tex2html_wrap_inline1902 parameter encourage greater exploration. When payoffs are boolean, the normal approximation to the binomial distribution can be used to construct the confidence interval (though the binomial should be used for small n). Other payoff distributions can be handled using their associated statistics or with nonparametric methods. The method works very well in empirical trials. It is also related to a certain class of statistical techniques known as experiment design methods [17], which are used for comparing multiple treatments (for example, fertilizers or drugs) to determine which treatment (if any) is best in as small a set of experiments as possible.

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996