We initiated our research with the task of having an agent learn to shoot a moving ball passed by a passing agent with a simple behavior which could affect the speed and the trajectory of the ball. Hence, in this task there is a single learning agent. We call the learned behavior a shooting template. This initial task provided the basis for our interest and on-going research on more elaborate multi-agent learning scenarios. We describe here only enough of the experiments to illustrate the task. For more details, please see [Stone & Veloso1995].
In all of our initial experiments, a passer accelerates as fast
as possible towards a stationary ball in order to propel it between a
shooter and the goal. The resulting speed of the ball is
determined by the distance that the passer started from the ball. The
shooter's task is to time its acceleration so that it intercepts the
ball's path and redirects it into the goal. We constrain the shooter
to accelerate at a fixed constant rate (while steering along a fixed
line) once it has decided to begin its approach. Thus the
behavior to be learned consists of the decision of when to begin
moving: at each action opportunity the shooter either starts or
waits. Once having started, the decision may not be retracted. The
shooter must make its decision based solely on the ball's and its own
coordinates reported at a (simulated) rate of 60Hz.
Throughout our initial experiments, the shooter's initial position
varies randomly within a continuous range: its initial heading varies
over 70 degrees and its initial x and y coordinates vary independently
over 40 units as shown in Figure 2(a).
The two shooters pictured show the extreme possible starting
positions, both in terms of heading and location.
Since the ball's momentum is initially across the front of the goal, the shooter must compensate by aiming wide of the goal (by 170 units) when making contact with the ball (see Figure 2(b)). Before beginning its approach, the shooter chooses a point wide of the goal at which to aim. Once deciding to start, it then steers along an imaginary line between this point and the shooter's initial position, continually adjusting its heading until it is moving in the right direction along this line. The shooter's aim is to intercept the ball's path so as to redirect it into the goal.
Figure: (a) The initial position for the experiments. The
car in the lower part of the picture, the passer, accelerates
full speed ahead until it hits the ball. Another car, the
shooter then attempts to redirect the ball into the goal on the left.
The two cars in the top of the figure illustrate the extremes of the
range of angles of the shooter's initial position. The square behind
these two cars indicates the range of the initial position of the
center of the shooter. (b) A diagram illustrating the paths
of the ball and the cars during a typical trial.
The task of learning a shooting template has several parameters that
can control how hard it is. First, the ball can be moving at the same
speed for all training examples or at different speeds. Second, the
ball can be coming with the same trajectory or with different
trajectories. Third, the goal can always be in the same place during
testing as during training, or it can change locations (think of this
parameter as the possibility of aiming for different parts of the
goal). Fourth, the training and testing can occur all in the same
location, or the testing can be moved to a different symmetrical
location on the field. Figure 3 illustrates some of
these variations. Using a neural network (NN), we have learned a
shooting template that can be successful despite variations along all
of these dimensions. Table
summarizes our initial
results.
Figure: Variations to the initial setup: (a) The initial position
in the opposite corner of the field (a different action quadrant);
(b) Varied ball trajectory: The line indicates the ball's
possible initial positions, while the passer always starts directly
behind the ball facing towards a fixed point. The passer's initial
distance from the ball controls the speed at which the ball is passed;
(c) Varied goal position: the placement of the higher and
lower goals are both pictured at once.
: A summary of our learning results for shooting a moving ball
into a goal. The learned decision policy is trained on the examples
generated with the random decision criterion.