In the pursuit domain, communication creates new possibilities for predator behavior. Here, agents can still be fully heterogeneous. But now cooperating agents can also communicate with one another. Since the prey acts on its own in the pursuit domain, it has no other agents with which to communicate. However the predators can freely exchange information in order to help them capture the prey more effectively. The current situation is illustrated in Figure 10.
Figure 10: The pursuit domain with communicating agents. Agents can still be fully heterogeneous but now the predators can communicate with one another.
Tan uses communicating agents in the pursuit domain to conduct some interesting multiagent Q-learning experiments . In his instantiation of the domain, there are several prey agents and the predators have limited vision so that they may not always know where the prey are. Thus the predators can help each other by informing each other of their sensory input. Tan shows that they might also help each other by exchanging reinforcement episodes and/or control policies.
Recall the ``local'' strategy defined by Stephens and Merx in which each predator simply moved to its closest ``capture position.'' In their instantiation of the domain, the predators can see the prey, but not each other. With communication possible, they define two more possible strategies for the predators . When using a ``distributed'' strategy, the agents are still homogeneous, but they communicate to insure that each moves toward a different capture position. In particular, the predator farthest from the prey chooses the capture position closest to it, and announces that it will approach that position. Then the next farthest predator chooses the closest capture position from the remaining three, and so on. This simple protocol encourages the predators to close in on the prey from different sides. A distributed strategy, it is much more effective than the local policy and does not require very much communication. However there are situations in which it does not succeed.
Stephens and Merx then present one more strategy that always succeeds but requires much more communication: the ``central'' strategy . The central strategy is effectively a single agent system. Three predators transmit all of their sensory inputs to one central agent which then decides where all the predators should move and transmits its decision back to them. In this case, there is really only one intelligent controlling agent and three puppets. Observe that by taking MAS to the extreme of full communication, we may arrive at a single-agent system.
Benda et al., in the original presentation of the pursuit domain, also consider the full range of communication possibilities, all the way up to the central strategy . They consider the possible organizations of the four predators when any pair can either exchange data, exchange data and goals, or have one control the other. The tradeoff between lower communication costs and better decisions is described. Communication costs might come in the form of limited bandwidth or consumption of reasoning time.
Another way to frame this tradeoff is as one between cost and freedom: as communication cost (time) increases, freedom decreases. Osawa suggests that the predators should move through four phases. In increasing order of cost (decreasing freedom), they are: autonomy, communication, negotiation, and control . When the predators stop making sufficient progress toward the prey using one strategy, they should move to the next most expensive strategy. Thus they can close in on the prey efficiently and effectively.
We identify an important lesson to learn from the above examples: