CMUnited RoboCup-98 Simulator Team

This page presents the CMUnited team entered in the RoboCup-98 simulator league. For more information about our entire projcet (including small and legged robots) please see our RoboSoccer project page.

People

Peter Stone, Manuela Veloso, Patrick Riley

Results

CMUnited-98 is the RoboCup-98 Simulator League World Champion Team!

The tournament included 34 teams from around the world.
Over the course of 8 games, CMUnited-98 outscored its opponents by a combined score of 66-0!
Results of RoboCup-98 are available from Itsuki Noda and from Humboldt (including logs of all the games).

Accelerators
Summary | Layered Learning | Locker-Room Agreement
Communication | SPAR | PLOS | TPOT-RL | Software

Summary

The CMUnited-98 simulator team uses the following novel multi-agent techniques to achieve adaptive coordination:
(1) Hierarchical machine learning (Layered learning)
(2) Flexible, adaptive formations (Locker-room agreement)
(3) Single-channel, low-bandwidth communication
(4) Predictive, locally optimal skills (PLOS)
(5) Strategic positioning using attraction and repulsion (SPAR)

In addition, using the CMUnited-97 simulator team, we developed a new multi-agent reinforcement learning technique called
(6) Team-Partitioned, Opaque Transition Reinforcement Learning (TPOT-RL)
Articles and papers describing (1)-(6) are available from our main simulator team homepage and below.

Layered Learning

Layered learning is a hierarchical machine learning technique in which lower-level machine learning modules create the action or input spaces of higher-level learning modules. Developed by Peter Stone as a part of his thesis research, layered learning enables successful generalization in high-dimensional, complex spaces.
Layered learning does not automate the process of choosing the hierarchical learning layers or methods. However, with appropriate task decompositions and learning methods, powerful generaliztion is possible. For example, in the robotic soccer domain, we have linked the following three learned layers:

Neural networks were used by individual players to learn how to intercept a moving ball.
With the recievers and opponents using this first learned behavior to try to receive or intercept passes, a decision tree (C4.5) was used to learn the likelihood that a given pass would succeed.
This learned decision tree was used to abstract a very high-dimensional state-space into one manageable for a new multi-agent reinforcement learning technique (see TPOT-RL below).
Layered learning is described in detail in:
Applied Artificial Intelligence (AAI), Volume 12, 1998.
A Layered Approch to Learning Client Behaviors in the RoboCup Soccer Server.
Peter Stone and Manuela Veloso.
HTML version
A Japanese translation by Takayoshi Ishii
Some related slides: ( PostScript / html)

Locker-Room Agreement

We characterize robotic soccer as an instance of a class of domains called Periodic Team Synchronization (PTS) domains. In this class of domains, a team of agents has periodic opportunities to communicate fully in a safe, off-line situation (i.e. in the "locker-room"). However, in general the agents must act autonomously in real-time with little or no communication possible.
To deal with the challenges of PTS domains, we introduce the concept of a Locker-Room Agreement by which agents determine ahead of time their communication language, their sensory triggers for changes in team strategy, and some multi-agent plans for predictable situations.
In CMUnited-98, the locker-room agreement includes a flexible team structure that allows homogeneous agents to switch roles (positions such as defender or attacker) within a single formation. It also allows the entire team to switch formations (for instance from a defensive to an offensive formation) based on agreed-upon sensory triggers. For example, CMUnited-98 began all of its games in a 4-3-3 formation (4 defenders, 3 midfielders, 3 forwards). However, if they had ever found themselves losing near the end of the game, they would have smoothly switched to a formation with fewer defenders and more forwards. In the actual competition, they often switched to a defensive formation with additional defenders and fewer forwards once they were safely in the lead.
The locker-room agreement also facilitates set-plays, or precompiled multi-agent plans for frequent situations such as kick-offs, goal-kicks, and corner-kicks. While many teams had trouble clearly the ball from the defensive zone after the goalie caught the ball, CMUnited-98 successfully used a sequence of passes to clear the ball first to the side of the field and then up the sideline.
Finally, the locker-room agreement defines the agent communication protocol as described below.
The locker-room agreement was also a key factor in the success of the world champion CMUnited-97 small robot team. It is described most completely in the following publication:
In Artificial Intelligence (AIJ), 1999.
Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork.
Peter Stone and Manuela Veloso.
HTML version

Single-channel, low-bandwidth communication

The Soccer Server's communication model is very good for examining single-channel, low-bandwidth communication environments. All agents must broadcast their messages on a single channel so that nearby agents on both teams can hear; there is a limited range of communication; and there is a limited hearing capacity so that message transmission is unreliable.
Challenges to overcome include robustness to lost messages; active interference by opponents; messages requiring multiple simulatneous responses by several teammates; and message targetting. Our approach was successfully implemented and used by CMUnited-98 agents to share state information and to coordinate formation changes via the locker-room agreement.
For details, see:
In Artificial Intelligence (AIJ), 1999.
Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork.
Peter Stone and Manuela Veloso.
HTML version

SPAR

In both the CMUnited-97 simulator and small robot teams, agents were organized in a team structure based on flexible roles and formations, which represented an effective and solid novel core team architecture for multi-agent systems. In the 1997 teams, an agent's positioning within its role depended only on the ball's location.
One of the most significant improvements of the CMUnited-98 small robot and simulator teams was an algorithm for sophisticated reasoning about the positioning of an agent when it does not have the ball in anticipation of a pass from a teammate who is in control of the ball. The agent's positioning is based upon teammate and adversary locations, as well as the ball's and the attacking goal's location. Jointly with the small robot team, we developed a new algorithm, SPAR (Strategic Positioning using Attraction and Repulsion), that determines the optimal positioning as the solution to a linear-programming based optimization problem with a multiple-objective function subject to several constraints.
The objectives include maximizing the distance to all the opponents and own teammates (repulsion), and minimizing the distance to the opponent goal and to the current position of the ball (attraction). The constraints reflected the particular setup of the games. In the CMUnited-98 simulator team, SPAR uses the following constraints:

The agent stays within a maximum distance from the ball.
The agent stays on the field.
The agent stays in an on-side position.
The agent only considers positions from which it predicts that it can successfully receive a pass from the teammate with the ball. It uses the trained decision tree (described under layered learning) to assess this constraint.
When the agents use SPAR, they respond effectively to the dynamic world, advancing the ball up towards the opponent's goal using a series of successful passes.
For further details, see:
Submitted to Third International Conference on Autonomous Agents (Agents'99)
Anticipation: A Key for Collaboration in a Team of Agents.
Manuela Veloso, Peter Stone, and Michael Bowling
HTML version

PLOS

Another significant improvement of the CMUnited-98 over the CMUnited-97 simulator teams is the addition of PLOS: Predictive, Locally Optimal Skills. Locally optimal both in time and in space, PLOS was used to create sophisticate low-level behaviors, including:

Dribbling the ball while keeping it away from opponents
Fast ball interception
Flexible kicking that trades off between power and speed of release based on opponent positions and desired eventual ball speed
A goaltender that decides when to hold its position and when to advance towards the ball based on opponent positions.
For details, see:
In "RoboCup-98: Robot Soccer World Cup II", M. Asada and H. Kitano (eds.), 1999. Springer Verlag, Berlin.
The CMUnited-98 Champion Simulator Team. (extended version)
Peter Stone and Manuela Veloso and Patrick Riley
HTML version

TPOT-RL

We used our CMUnited-97 software to devise and study a new multi-agent reinforcement learning algorithm called Team-Partitioned Opaque-Transition Reinforcment Learning, or TPOT-RL. TPOT-RL introduces the use of action-dependent features to generalize the state space. In our work, we use a learned action-dependent feature space to aid higher-level reinforcement learning. TPOT-RL is an effective technique to allow a team of agents to learn to cooperate towards the achievement of a specific goal. It is an adaptation of traditional RL methods that is applicable in complex, non-Markovian, multi-agent domains with large state spaces and limited training opportunities.
In our experiments, we used TPOT-RL to train the passing and shooting patterns of a team of agents in fixed positions with no dribbling capabilities. We were able to achieve better results through learning than when using a fixed heuristic policy.
However, since CMUnited-98 used more flexible positioning and introduced a dribbling behavior, the results did not apply directly. We are currently looking into appllying TPOT-RL in other domains, such as network routing.
For details, see:
In Conference on Automated Learning and Discovery (CONALD)
and in "RoboCup-98: Robot Soccer World Cup II", M. Asada and H. Kitano (eds.), 1999. Springer Verlag, Berlin.
Team-Partitioned, Opaque-Transition Reinforcement Learning.
Peter Stone and Manuela Veloso
HTML version

The Software

You can now download a portion of the CMUnited-98 source code. It is written in C++ and has been tested under Linux and under SunOS.
We made some effort to package the code in such a way that people will be able to learn from and incorporate it. The README file included in the directory describes how to compile and run the code. You should be able to easily produce dribbling, kicking, and ball interception behaviors.
Included with the source code release is a paper describing our control algorithms in detail:
In "RoboCup-98: Robot Soccer World Cup II", M. Asada and H. Kitano (eds.), 1999. Springer Verlag, Berlin.
The CMUnited-98 Champion Simulator Team. (extended version)
Peter Stone, Manuela Veloso, and Patrick Riley
HTML version
The purpose of the code release is to allow people to get quickly past the low-level implementation details involved in working with the soccer server. Our high-level behaviors, including machine learning modules, a teamwork construct, and a communication paradigm, are best described in our various papers (see above).
Please note that the code is released as is, with no support provided. Also, please keep track of what code and ideas of ours you use.
You may also run CMUnited-98 as it ran in Paris.
Download the tarred, gzipped executables for Linux or SunOS (should also run under Solaris).
A README file with instructions on how to start the clients is included.

Links

Sony dream
Windmill Wanderers from University of Amsterdam, Netherlands ( AIACS )
AT-Humboldt from Humboldt University, Berlin, Germany
UFSC-Team from Federal University of Santa Catarina - BRAZIL
Linkoping Lizards (Aka Headless Chickens) from Linkoping University, Sweden
CAT_Finland from Oulu University, Finland
CosmOz from Universitat des Saarlandes & DFKI GmbH, Germany
Ulm-Sparrows from University of Ulm, Germany
Mainz Rolling Brains from University of Mainz, Germany
Essex Wizards from University of Essex, UK

Simulator Team Homepage | RoboSoccer Project Homepage | Computer Science Department | School of Computer Science

	People Peter Stone, Manuela Veloso, Patrick Riley
	Results CMUnited-98 is the RoboCup-98 Simulator League World Champion Team! The tournament included 34 teams from around the world. Over the course of 8 games, CMUnited-98 outscored its opponents by a combined score of 66-0! Results of RoboCup-98 are available from Itsuki Noda and from Humboldt (including logs of all the games).
	Accelerators Summary \| Layered Learning \| Locker-Room Agreement Communication \| SPAR \| PLOS \| TPOT-RL \| Software
	Summary The CMUnited-98 simulator team uses the following novel multi-agent techniques to achieve adaptive coordination: (1) Hierarchical machine learning (Layered learning) (2) Flexible, adaptive formations (Locker-room agreement) (3) Single-channel, low-bandwidth communication (4) Predictive, locally optimal skills (PLOS) (5) Strategic positioning using attraction and repulsion (SPAR) In addition, using the CMUnited-97 simulator team, we developed a new multi-agent reinforcement learning technique called (6) Team-Partitioned, Opaque Transition Reinforcement Learning (TPOT-RL) Articles and papers describing (1)-(6) are available from our main simulator team homepage and below.
	Layered Learning Layered learning is a hierarchical machine learning technique in which lower-level machine learning modules create the action or input spaces of higher-level learning modules. Developed by Peter Stone as a part of his thesis research, layered learning enables successful generalization in high-dimensional, complex spaces. Layered learning does not automate the process of choosing the hierarchical learning layers or methods. However, with appropriate task decompositions and learning methods, powerful generaliztion is possible. For example, in the robotic soccer domain, we have linked the following three learned layers: Neural networks were used by individual players to learn how to intercept a moving ball. With the recievers and opponents using this first learned behavior to try to receive or intercept passes, a decision tree (C4.5) was used to learn the likelihood that a given pass would succeed. This learned decision tree was used to abstract a very high-dimensional state-space into one manageable for a new multi-agent reinforcement learning technique (see TPOT-RL below). Layered learning is described in detail in: Applied Artificial Intelligence (AAI), Volume 12, 1998. A Layered Approch to Learning Client Behaviors in the RoboCup Soccer Server. Peter Stone and Manuela Veloso. HTML version A Japanese translation by Takayoshi Ishii Some related slides: ( PostScript / html)
	Locker-Room Agreement We characterize robotic soccer as an instance of a class of domains called Periodic Team Synchronization (PTS) domains. In this class of domains, a team of agents has periodic opportunities to communicate fully in a safe, off-line situation (i.e. in the "locker-room"). However, in general the agents must act autonomously in real-time with little or no communication possible. To deal with the challenges of PTS domains, we introduce the concept of a Locker-Room Agreement by which agents determine ahead of time their communication language, their sensory triggers for changes in team strategy, and some multi-agent plans for predictable situations. In CMUnited-98, the locker-room agreement includes a flexible team structure that allows homogeneous agents to switch roles (positions such as defender or attacker) within a single formation. It also allows the entire team to switch formations (for instance from a defensive to an offensive formation) based on agreed-upon sensory triggers. For example, CMUnited-98 began all of its games in a 4-3-3 formation (4 defenders, 3 midfielders, 3 forwards). However, if they had ever found themselves losing near the end of the game, they would have smoothly switched to a formation with fewer defenders and more forwards. In the actual competition, they often switched to a defensive formation with additional defenders and fewer forwards once they were safely in the lead. The locker-room agreement also facilitates set-plays, or precompiled multi-agent plans for frequent situations such as kick-offs, goal-kicks, and corner-kicks. While many teams had trouble clearly the ball from the defensive zone after the goalie caught the ball, CMUnited-98 successfully used a sequence of passes to clear the ball first to the side of the field and then up the sideline. Finally, the locker-room agreement defines the agent communication protocol as described below. The locker-room agreement was also a key factor in the success of the world champion CMUnited-97 small robot team. It is described most completely in the following publication: In Artificial Intelligence (AIJ), 1999. Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork. Peter Stone and Manuela Veloso. HTML version
	Single-channel, low-bandwidth communication The Soccer Server's communication model is very good for examining single-channel, low-bandwidth communication environments. All agents must broadcast their messages on a single channel so that nearby agents on both teams can hear; there is a limited range of communication; and there is a limited hearing capacity so that message transmission is unreliable. Challenges to overcome include robustness to lost messages; active interference by opponents; messages requiring multiple simulatneous responses by several teammates; and message targetting. Our approach was successfully implemented and used by CMUnited-98 agents to share state information and to coordinate formation changes via the locker-room agreement. For details, see: In Artificial Intelligence (AIJ), 1999. Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork. Peter Stone and Manuela Veloso. HTML version
	SPAR In both the CMUnited-97 simulator and small robot teams, agents were organized in a team structure based on flexible roles and formations, which represented an effective and solid novel core team architecture for multi-agent systems. In the 1997 teams, an agent's positioning within its role depended only on the ball's location. One of the most significant improvements of the CMUnited-98 small robot and simulator teams was an algorithm for sophisticated reasoning about the positioning of an agent when it does not have the ball in anticipation of a pass from a teammate who is in control of the ball. The agent's positioning is based upon teammate and adversary locations, as well as the ball's and the attacking goal's location. Jointly with the small robot team, we developed a new algorithm, SPAR (Strategic Positioning using Attraction and Repulsion), that determines the optimal positioning as the solution to a linear-programming based optimization problem with a multiple-objective function subject to several constraints. The objectives include maximizing the distance to all the opponents and own teammates (repulsion), and minimizing the distance to the opponent goal and to the current position of the ball (attraction). The constraints reflected the particular setup of the games. In the CMUnited-98 simulator team, SPAR uses the following constraints: The agent stays within a maximum distance from the ball. The agent stays on the field. The agent stays in an on-side position. The agent only considers positions from which it predicts that it can successfully receive a pass from the teammate with the ball. It uses the trained decision tree (described under layered learning) to assess this constraint. When the agents use SPAR, they respond effectively to the dynamic world, advancing the ball up towards the opponent's goal using a series of successful passes. For further details, see: Submitted to Third International Conference on Autonomous Agents (Agents'99) Anticipation: A Key for Collaboration in a Team of Agents. Manuela Veloso, Peter Stone, and Michael Bowling HTML version
	PLOS Another significant improvement of the CMUnited-98 over the CMUnited-97 simulator teams is the addition of PLOS: Predictive, Locally Optimal Skills. Locally optimal both in time and in space, PLOS was used to create sophisticate low-level behaviors, including: Dribbling the ball while keeping it away from opponents Fast ball interception Flexible kicking that trades off between power and speed of release based on opponent positions and desired eventual ball speed A goaltender that decides when to hold its position and when to advance towards the ball based on opponent positions. For details, see: In "RoboCup-98: Robot Soccer World Cup II", M. Asada and H. Kitano (eds.), 1999. Springer Verlag, Berlin. The CMUnited-98 Champion Simulator Team. (extended version) Peter Stone and Manuela Veloso and Patrick Riley HTML version
	TPOT-RL We used our CMUnited-97 software to devise and study a new multi-agent reinforcement learning algorithm called Team-Partitioned Opaque-Transition Reinforcment Learning, or TPOT-RL. TPOT-RL introduces the use of action-dependent features to generalize the state space. In our work, we use a learned action-dependent feature space to aid higher-level reinforcement learning. TPOT-RL is an effective technique to allow a team of agents to learn to cooperate towards the achievement of a specific goal. It is an adaptation of traditional RL methods that is applicable in complex, non-Markovian, multi-agent domains with large state spaces and limited training opportunities. In our experiments, we used TPOT-RL to train the passing and shooting patterns of a team of agents in fixed positions with no dribbling capabilities. We were able to achieve better results through learning than when using a fixed heuristic policy. However, since CMUnited-98 used more flexible positioning and introduced a dribbling behavior, the results did not apply directly. We are currently looking into appllying TPOT-RL in other domains, such as network routing. For details, see: In Conference on Automated Learning and Discovery (CONALD) and in "RoboCup-98: Robot Soccer World Cup II", M. Asada and H. Kitano (eds.), 1999. Springer Verlag, Berlin. Team-Partitioned, Opaque-Transition Reinforcement Learning. Peter Stone and Manuela Veloso HTML version
	The Software You can now download a portion of the CMUnited-98 source code. It is written in C++ and has been tested under Linux and under SunOS. We made some effort to package the code in such a way that people will be able to learn from and incorporate it. The README file included in the directory describes how to compile and run the code. You should be able to easily produce dribbling, kicking, and ball interception behaviors. Included with the source code release is a paper describing our control algorithms in detail: In "RoboCup-98: Robot Soccer World Cup II", M. Asada and H. Kitano (eds.), 1999. Springer Verlag, Berlin. The CMUnited-98 Champion Simulator Team. (extended version) Peter Stone, Manuela Veloso, and Patrick Riley HTML version The purpose of the code release is to allow people to get quickly past the low-level implementation details involved in working with the soccer server. Our high-level behaviors, including machine learning modules, a teamwork construct, and a communication paradigm, are best described in our various papers (see above). Please note that the code is released as is, with no support provided. Also, please keep track of what code and ideas of ours you use. You may also run CMUnited-98 as it ran in Paris. Download the tarred, gzipped executables for Linux or SunOS (should also run under Solaris). A README file with instructions on how to start the clients is included.
	Links Sony dream Windmill Wanderers from University of Amsterdam, Netherlands ( AIACS ) AT-Humboldt from Humboldt University, Berlin, Germany UFSC-Team from Federal University of Santa Catarina - BRAZIL Linkoping Lizards (Aka Headless Chickens) from Linkoping University, Sweden CAT_Finland from Oulu University, Finland CosmOz from Universitat des Saarlandes & DFKI GmbH, Germany Ulm-Sparrows from University of Ulm, Germany Mainz Rolling Brains from University of Mainz, Germany Essex Wizards from University of Essex, UK