12:00, Wed 17 Jan 1996, WeH 7220
----------------------------------------------------------------
Playing the Matching-Shoulders Lob-Pass Game
with Logarithmic Regret
Barak A. Pearlmutter
Siemens Corporate Research
(joint work with Kevin J. Lang and Joe Kilian at NECI)
Abstract
In Herrnstein's lob-pass game a player is confronted by an adaptive
two-armed bandit and must converge to the optimal mixed strategy as
rapidly as possible. The best previous algorithm for
matching-shoulders lob-pass, Abe and Takeuchi's ARTHUR, suffers a
regret of O(t^1/2) with probability one, where t is the number of
trials and regret is the number of points lost due to lack of
information.
We present a greedy algorithm that incurs O(log t) regret with
probability one. This matches the information theoretic lower bound
on asymptotic regret, and does so with good constant factors. Against
one sample opponent our algorithm suffers an average regret of 3.2
during 36,000,000 plays, as compared to the 1,440 points unnecessarily
lost by ARTHUR. The key new idea is that convergence to an optimal
strategy is possible in the matching shoulders case without accurately
estimating the opponent's internal parameters. We also derive and
meet an O(t^1/2) lower bound on the expected regret in the general
linear lob-pass game.
These results are in part counterintuitive. The noise-tolerant binary
search procedure that we develop for use in the proof of logarithmic
regret is of independent interest. The proof techniques used to
establish the lower bound in the non-matching shoulder case formalizes
and lends insight into the exploration vs exploitation tradeoffs
involved in problems of this nature, allowing us to make some
speculations concerning animal learning and behavior.
References
N. Abe and J.-I. Takeuchi (1993) The `Lob-Pass' Problem and an On-Line
Learning Model of Rational Choice, In Sixth Annual ACM Workshop on
Computational Learning Theory, 422-428.
R. Herrnstein (1990) Rational Choice Theory, American Psychologist,
45(3), 356--367.
J. Kilian, K. J. Lang and B. A. Pearlmutter (1994) Playing the
Matching-Shoulders Lob-Pass Game with Logarithmic Regret, In Seventh
Annual ACM Workshop on Computational Learning Theory, 159-164