Multiagent Learning in the Presence of Agents with Limitations
Download
Postscript or PDF
CMU's Technical Report Number: CMU-CS-03-118
Abstract
Optimal behavior for one agent depends upon the behavior of the other
agents, which are learning as well. Multiagent environments are
therefore non-stationary, violating the traditional assumption
underlying single-agent learning. In addition, agents in complex
tasks may have limitations, such as physical constraints or
designer-imposed approximations of the task that make learning
tractable. Limitations prevent agents from acting optimally, which
complicates the already challenging problem. A learning agent must
effectively compensate for its own limitations while exploiting the
limitations of the other agents. My thesis research focuses on these
two challenges, namely multiagent learning and limitations, and
includes four main contributions.
First, the thesis introduces the novel concepts of a variable learning
rate and the WoLF (Win or Learn Fast) principle to account for other
learning agents. The WoLF principle is capable of making rational
learning algorithms converge to optimal policies, and by doing so
achieves two properties, rationality and convergence, which had not
been achieved by previous techniques. The converging effect of WoLF
is proven for a class of matrix games, and demonstrated empirically
for a wide-range of stochastic games.
Second, the thesis contributes an analysis of the effect of
limitations on the game-theoretic concept of Nash equilibria. The
existence of equilibria is important if multiagent learning
techniques, which often depend on the concept, are to be applied to
realistic problems where limitations are unavoidable. The thesis
introduces a general model for the effect of limitations on agent
behavior, which is used to analyze the resulting impact on equilibria.
The thesis shows that equilibria do exist for a few restricted classes
of games and limitations, but even well-behaved limitations do not
preserve the existence of equilibria, in general.
Third, the thesis introduces GraWoLF, a general-purpose, scalable,
multiagent learning algorithm. GraWoLF combines policy gradient
learning techniques with the WoLF variable learning rate. The
effectiveness of the learning algorithm is demonstrated in both a card
game with an intractably large state space, and an adversarial robot
task. These two tasks are complex and agent limitations are prevalent
in both.
Fourth, the thesis describes the CMDragons robot soccer team strategy
for adapting to an unknown opponent. The strategy uses a notion of
plays as coordinated team plans. The selection of team plans is the
decision point for adapting the team to its current opponent, based on
the outcome of previously executed plays. The CMDragons were the
first RoboCup robot team to employ online learning to autonomously
alter its behavior during the course of a game.
These four contributions demonstrate that it is possible to
effectively learn to act in the presence of other learning agents in
complex domains when agents may have limitations. The introduced
learning techniques are proven effective in a class of small games,
and demonstrated empirically across a wide range of settings that
increase in complexity.