Training

Next: Testing Up: Fixed Ball Motion Previous: Gathering training data

Training

Using this data, we were able to train a NN that the shooter could use as a part of a learned shooting policy that enabled it to score consistently. We tried several configurations of the NN, settling on one with a single layer of 2 hidden units and a learning rate of .01. Each layer had a bias unit with constant input of 1.0. We normalized all other inputs to fall roughly between 0.0 and 1.0 (see Figure 4(b)). The target outputs were .9 for positive examples (successful trials) and .1 for negative examples. Weights were all initialized randomly between -0.5 and 0.5. The resulting NN is pictured in Figure 4(b). This NN was not the result of an exhaustive search for the optimal configuration to suit this task, but rather the quickest and most successful of about 10 alternatives with different numbers of hidden units and different learning rates. Once settling on this configuration, we never changed it, concentrating instead on other research issues.

Figure 4: (a) The predicates we used to describe the world for the purpose of learning to shoot a moving ball are illustrated. (b) The NN used to learn the shooting policy. The NN has 2 hidden units with a bias unit at both the input and hidden layers.

Training this NN on the entire training set for 3000 epochs resulted in a mean squared error of .0386 and 253 of the examples misclassified (i.e. the output was closer to the wrong output than the correct one). Training for more epochs did not help noticeably. Due to the sensor noise during training, the concept was not perfectly learned.

Peter Stone
Wed Nov 8 14:49:26 EST 1995