next up previous
Next: Discussion and Conclusion Up: Higher Level Multiagent Extensions Previous: Cooperative Learning

Adversarial Learning

 

At the same time as teammates are cooperating and passing the ball amongst themselves, they must also consider how best to defeat their opponents. The shooting template developed in Section 4 can be incorporated into an adversarial situation by adding a defender.

In Section 4, the shooter was trained to aim at different goal locations. The small goal was used to train the shooter's accuracy. However, in competitive situations, the goal must be larger than a player so that a single player cannot entirely block the goal. In order to make this adversarial situation fair, the goal is widened by 3 times (see Figure 10).

   figure312
Figure 10: An adversarial scenario: the defender learns to block the shot, while the shooter simultaneously tries to learn to score.

As in the collaborative case, our approach to this problem involves holding the learned shooting template fixed and allowing the clients to learn behaviors at a higher level. Based on the defender's position, the shooter must choose to shoot at the upper, middle, or lower portion of the goal. Meanwhile, based on the shooter's and the ball's positions as well as its own current velocity, the defender must choose in which direction to accelerate. For simplicity, the defender may only set it's throttle to full forward, full backwards, or none. Since the outputs of both the shooter's and the defender's learning functions are discrete, Reinforcement Learning techniques, such as Q-learning [Kaelbling, Littman, MooreKaelbling et al.1996], are possible alternatives to neural networks.

The defender may be able to learn to judge where the shooter is aiming before the shooter strikes the ball by observing the shooter's approach. Similarly, if the defender starts moving, the shooter may be able to adjust and aim at a different part of the goal. Thus as time goes on, the opponents will need to co-evolve in order to adjust to each other's changing strategies. Note that a sophisticated agent may be able to influence the opponent's behavior by acting consistently for a period of time, and then drastically changing behaviors so as to fool the opponent.

At the next higher level of behavior, this adversarial scenario can be extended by combining it with the collaborative passing behavior. When receiving the ball, the passer (as in Figure 8) could decide whether to pass the ball or shoot it immediately, based on the defender's motion. This extra option would further complicate the defender's behavior. Some related results pertaining to the decision of whether to pass or shoot appear in [Stone VelosoStone Veloso1996a].



next up previous
Next: Discussion and Conclusion Up: Higher Level Multiagent Extensions Previous: Cooperative Learning



Peter Stone
Thu Aug 22 12:51:13 EDT 1996