Input dimensionality

Next: Training and validation Up: Pros and cons vs. Previous: Pros and cons vs.

Input dimensionality

Neural networks are generally more suitable for directly processing a large number of inputs. A typical example of this is learning from pixel values in images. A network usually has at least one node for each input variable. When there are many inputs, this results in a large number of additional parameters to determine during training. That means training may be slower and it can be more difficult to choose good learning rates, but neural nets have had success at processing images and other high dimensional input spaces.

Memory based learning with locally weighted regression is best suited for problems with a moderate number of input variables (40 or less). As the dimensionality increases, several things become problematic. The matrix inversion in the regression (see eq. 8) requires computation time that goes up as the cube of the number of dimensions. That can slow down the prediction speed significantly. The operation of weighting the data points and finding those nearest the query is also slowed down when the dimensionality increases. Even the efficient, tree-based data retrieval algorithms used by Vizier lose their zip as the dimensionality get high. The amount of data required for a good fit goes up with the dimensionality as well. Huge data sets are more problematic for memory based methods because they keep all the data, while neural nets discard the data once training is complete.

Despite these problems, memory based learning is still a good choice in many problems with high dimensional input. In some cases there are many input variables available, but the output is really only a function of a smaller number of them. Then the problem is to determine which inputs are relevant. Memory based learning is excellent for this kind of feature selection because it performs leave one out cross validation so cheaply (see the later sections on feature selection and cross validation).

Next: Training and validation Up: Pros and cons vs. Previous: Pros and cons vs.

Jeff Schneider
Fri Feb 7 18:00:08 EST 1997