The fourth experiment is essentially the same as the third experiment except in the robot arm domain. Here, three, hand crafted, configurations of a single obstacle with the goal in a fixed position were used, as shown in Figure 32. To increase the statistical variation each configuration was run five times with a different random seed. The curves in Figure 33 are therefore the average across 15 experimental runs.
The top curve of Figure 31 is the Q-learning algorithm, the bottom curve the function composition system. The knee of the function composition system's curve occurs at about 4400 steps. The knee of the basic Q-learning algorithm at about 68,000 steps giving a speed up of about 15.