One obvious concern about the criterion described here is its computational cost. In situations where obtaining new examples may take days and cost thousands of dollars, it is clearly wise to expend computation to ensure that those examples are as useful as possible. In other situations, however, new data may be relatively inexpensive, so the computational cost of finding optimal examples must be considered.
Table 1 summarizes the computation times for the two learning algorithms discussed in this paper. Note that, with the mixture of Gaussians, training time depends linearly on the number of examples, but prediction time is independent. Conversely, with locally weighted regression, there is no ``training time'' per se, but the cost of additional examples accrues when predictions are made using the training set.
While the training time incurred by the mixture of Gaussians may make it infeasible for selecting optimal action learning actions in realtime control, it is certainly fast enough to be used in many applications. Optimized, parallel implementations will also enhance its utility. Locally weighted regression is certainly fast enough for many control applications, and may be made faster still by optimized, parallel implementations. It is worth noting that, since the prediction speed of these learners depends on their training set size, optimal data selection is doubly important, as it creates a parsimonious training set that allows faster predictions on future points.
Table 1: Computation times on a Sparc 10 as a function of training set size m. Mixture model had 60 Gaussians trained for 20 iterations. Reference times are per reference point; candidate times are per candidate point per reference point.