Several studies involving competitive agents were described in the heterogeneous non-communicating scenario (see Section 5). In the current scenario, there are many more examples of competitive agents.
Zeng and Sycara study a competitive negotiation scenario in which agents use Bayesian Learning techniques to update models of each other based on bids and counter bids in a negotiation process .
Similar to Tan's work on multiagent RL in the pursuit domain  is Weiß's work with competing Q-learners. The agents compete with each other to earn the right to control a single system . The highest bidder pays a certain amount to be allowed to act, then receives any reward that results from the action.
Another Q-learning approach, this time with benevolent agents, has been to explore the interesting idea of having one agent teach another agent through communication. Starting with a trainer that has moderate expertise in a task, a learner can be rewarded for mimicking the trainer. Furthermore, the trainer can recommend to the learner what action to take in a given situation so as to direct the learner toward a reward state. Eventually, the learner is able to perform the task without any guidance. Clouse studies the effect of different levels of advice in a road-following domain . He concludes that moderate advice improves performance and speeds up learning, while too much advice leads to worse performance because the learner does not experience enough negative examples while training.
While training is a useful concept, some research is driven by the goal of reducing the role of the human trainer. As opposed to the process of shaping, in which the system designer develops simple behaviors and slowly builds them into more complex ones, populations appropriately seeded for competitive co-evolution can reduce the amount of designer effort. Potter and Grefenstette illustrate this effect in the domain described above in which two robots compete for a stationary pellet of food . Subpopulations of rules are seeded to be more effective in different situations. Thus specialized subpopulations of rules corresponding to shaped behaviors tend to emerge.
Rather than competitive co-evolution Bull et al. build a system system which uses cooperative co-evolution . They use GAs to evolve separate communicating agents to control different legs of a quadrapedal robot.
Drawing inspiration from competition in human societies, several researchers have designed systems based on the law of supply and demand. In the contract nets framework, agents all have their own goals, are self-interested, and have limited reasoning resources . They bid to accept tasks from other agents and then can either perform the tasks (if they have the proper resources) or subcontract them to still other agents. Agents must pay to contract their tasks out and thus shop around for the lowest bidder. Sandholm and Lesser discuss some of the issues that arise in contract nets .
In a similar spirit is an implemented multiagent system that controls air temperature in different rooms of a building . A person can set one's thermostat to any temperature. Then depending on the actual air temperature, the agent for that room tries to ``buy'' either hot or cold air from another room that has an excess. At the same time, the agent can sell the excess air at the current temperature to other rooms. Modeling the loss of heat in the transfer from one room to another, the agents try to buy and sell at the best possible prices. The market regulates itself to provide equitable usage of a shared resource.