12:00, Wed 17 Jan 1996, WeH 7220 ---------------------------------------------------------------- Playing the Matching-Shoulders Lob-Pass Game with Logarithmic Regret Barak A. Pearlmutter Siemens Corporate Research (joint work with Kevin J. Lang and Joe Kilian at NECI) Abstract In Herrnstein's lob-pass game a player is confronted by an adaptive two-armed bandit and must converge to the optimal mixed strategy as rapidly as possible. The best previous algorithm for matching-shoulders lob-pass, Abe and Takeuchi's ARTHUR, suffers a regret of O(t^1/2) with probability one, where t is the number of trials and regret is the number of points lost due to lack of information. We present a greedy algorithm that incurs O(log t) regret with probability one. This matches the information theoretic lower bound on asymptotic regret, and does so with good constant factors. Against one sample opponent our algorithm suffers an average regret of 3.2 during 36,000,000 plays, as compared to the 1,440 points unnecessarily lost by ARTHUR. The key new idea is that convergence to an optimal strategy is possible in the matching shoulders case without accurately estimating the opponent's internal parameters. We also derive and meet an O(t^1/2) lower bound on the expected regret in the general linear lob-pass game. These results are in part counterintuitive. The noise-tolerant binary search procedure that we develop for use in the proof of logarithmic regret is of independent interest. The proof techniques used to establish the lower bound in the non-matching shoulder case formalizes and lends insight into the exploration vs exploitation tradeoffs involved in problems of this nature, allowing us to make some speculations concerning animal learning and behavior. References N. Abe and J.-I. Takeuchi (1993) The `Lob-Pass' Problem and an On-Line Learning Model of Rational Choice, In Sixth Annual ACM Workshop on Computational Learning Theory, 422-428. R. Herrnstein (1990) Rational Choice Theory, American Psychologist, 45(3), 356--367. J. Kilian, K. J. Lang and B. A. Pearlmutter (1994) Playing the Matching-Shoulders Lob-Pass Game with Logarithmic Regret, In Seventh Annual ACM Workshop on Computational Learning Theory, 159-164