Our hypothesis is that the complexification process outlined above allows discovering more sophisticated strategies, i.e. strategies that are more effective, flexible, and general, and include more components and variations than do strategies obtained through search in a fixed space. To demonstrate this effect, we need a domain where it is possible to develop a wide range increasingly sophisticated strategies, and where sophistication can be readily measured. A coevolution domain is particularly appropriate because a sustained arms race should lead to increasing sophistication.
In choosing the domain, it is difficult to strike a balance between being able to evolve complex strategies and being able to analyze and understand them. Pursuit and evasion tasks have been utilized for this purpose in the past , and can serve as a benchmark domain for complexifying coevolution as well. While past experiments evolved either a predator or a prey, an interesting coevolution task can be established if the agents are instead equal and engaged in a duel. To win, an agent must develop a strategy that outwits that of its opponent, utilizing structure in the environment.
In this paper we introduce such a duel domain, in which two simulated robots try to overpower each other (Figure 4). The two robots begin on opposite sides of a rectangular room facing away from each other. As the robots move, they lose energy in proportion to the amount of force they apply to their wheels. Although the robots never run out of energy (they are given enough to survive the entire competition), the robot with higher energy wins when it collides with its competitor. In addition, each robot has a sensor indicating the difference in energy between itself and the other robot. To keep their energies high, the robots can consume food items, arranged in a symmetrical pattern in the room.
The robot duel task supports a broad range of sophisticated strategies that are easy to observe and interpret. The competitors must become proficient at foraging, prey capture, and escaping predators. In addition, they must be able to quickly switch from one behavior to another. The task is well-suited to competitive coevolution because naive strategies such as forage-then-attack can be complexified into more sophisticated strategies such as luring the opponent to waste its energy before attacking.
The simulated robots are similar to Kheperas (Mondada et al. 1993; Figure 6). Each has two wheels controlled by separate motors. Five rangefinder sensors can sense food and another five can sense the other robot. Finally, each robot has an energy-difference sensor, and a single wall sensor.
The robots are controlled with neural networks evolved with NEAT. The networks receive all of the robot sensors as inputs, as well as a constant bias that NEAT can use to change the activation thresholds of neurons. They produce three motor outputs: Two to encode rotation either right or left, and a third to indicate forward motion power. These three values are then translated into forces to be applied to the left and right wheels of the robot.
The state of the world at time is defined by the positions of the robots and food, the energy levels of the robots, and the internal states (i.e. neural activation) of the robots' neural networks, including sensors, outputs, and hidden nodes. The subsequent state is determined by the outputs of the robots' neural network controllers, computed from the inputs (i.e. sensor values) in in one step of propagation through the network. The robots change their position in according to their neural network outputs as follows. The change in direction of motion is proportional to the difference between the left and right motor outputs. The robot drives forward a distance proportional to the forward output on a continuous board of size 600 by 600. The robot first makes half its turn, then moves forward, then completes the second half of its turn, so that the turning and forward motions are effectively combined. If the robot encounters food, it receives an energy boost, and the food disappears from the world. The loss of energy due to movement is computed as the sum of the turn angle and the forward motion, so that even turning in place takes energy. If the robots are within a distance of 20, a collision occurs and the robot with a higher energy wins (see Appendix B for the exact parameter values used).
Since the observed state taken by the sensors does not include the internal state of the opponent robot, the robot duel is a partially-observable Markov decision process (POMDP). Since the next observed state depends on the decision of the opponent, it is necessary for robots to learn to predict what the opponent is likely to do, based on their past behavior, and also based on what is reasonable behavior in general. For example, it is reasonable to assume that if the opponent robot is quickly approaching and has higher energy, it is probably trying to collide. Because an important and complex portion of is not observable, memory, and hence recurrent connections, are crucial for success.
This complex robot-control domain allows competitive coevolution to evolve increasingly sophisticated and complex strategies, and can be used to understand complexification, as will be described next.