Next: A note on graphing Up: No Title Previous: Bayesian Locally Weighted Regression

Efficient Data Storage and Retrieval

When data sets get large, memory based learning can require a significant amount of computation just to accumulate all the relevant statistics needed to make its prediction. Vizier uses a data structure called a kd-tree to speed that process up tremendously. Kd-trees are data structures similar to binary trees, except that they are used to store continuous valued multi-dimensional data. This tutorial will not cover the details of kd-trees, which can be found in a good computer science algorithms text, or the specifics of the method used by Vizier which can be found in [4].

There is one aspect of the trees that is important to the Vizier user. Using the trees it is possible to trade off prediction speed and accuracy. When editing the metacode, you may have noticed a field labeled ``kd-tree.'' Its choices are Slow, precise; Medium; Fast, approximate. This choice adjusts the speed and accuracy of predictions. The default is Slow, precise, which is often fine for moderate sized data sets. Medium is a safe choice that can speed predictions up considerably. Fast should not be used unless you want only approximate predictions.

Jeff Schneider
Fri Feb 7 18:00:08 EST 1997