next up previous
Next: Varying the Ball's Trajectory Up: Learning a Low-Level Multiagent Previous: Testing

Varying the Ball's Speed

 

Encouraged by the success and flexibility of our initial solution, we next varied a parameter of the initial setup to test if the solution would extend further. Throughout Section 4.1, the passer started 35 units away from the ball and accelerated full-speed ahead until striking it. This process consistently propelled the ball at about 135 units/sec. To make the task more challenging, we varied the ball's speed by starting the passer randomly within a range of 32-38 units away from the ball. Now the ball would travel towards the shooter with a speed of 110-180 units/sec.

Before making any changes to the shooter's shooting policy, we tested the policy trained in Section 4.1 on the new task. However, we found that the 3-input neural network was not sufficient to handle the varying speed, giving a success rate of only 49.1% due to the mistiming of the acceleration. In order to accommodate for this mistiming, we added a fourth input to the neural network in order to represent the speed of the ball: Ball Speed

The shooter computed the Ball Speed from the ball's change in position over a given amount of time. Due to sensor noise, the change in position over a single time slice did not give an accurate reading. On the other hand, since the ball slowed down over time, the shooter could also not take the ball's total change in position over time. As a compromise, the shooter computed the Ball Speed from the ball's change in position over the last 10 time slices (or fewer if 10 positions had not yet been observed).

To accommodate for the additional quantity used to describe a world state, we gathered new training data. As before, the shooter used the random shooting policy during the training trials. This time, however, each training instance consisted of four inputs describing the state of the world plus an output indicating whether or not the trial was a successful one. Of the 5737 samples gathered for training, 963--or 16.8%--were positive examples.

For the purposes of training the new neural network, we scaled the Ball Speed of the ball to fall between 0.0 and 1.0: tex2html_wrap_inline922 . Except for the fourth input, the new neural network looked like the one pictured in Figure 5(b). It had two hidden units, a bias unit at each level, and a learning rate of .001. Training this neural network for 4000 epochs resulted in a mean squared error of .0512 with 651 of the instances misclassified.

Using this new neural network and the same decision function over its output as before, or the 4-input neural network shooting policy, our shooter was able to score 91.5% of the time with the ball moving at different speeds. Again, this success rate was observed in all four action quadrants. The results from this section are summarized in Table 3.

   table219
Table 3: When the Ball Speed varies, an additional input is needed.



next up previous
Next: Varying the Ball's Trajectory Up: Learning a Low-Level Multiagent Previous: Testing



Peter Stone
Thu Aug 22 12:51:13 EDT 1996