A distance metric

Next: Near neighbors Up: Memory Based Learning Previous: Memory Based Learning

A distance metric

The set of input attributes, for which we want to make a prediction about the resulting output attributes, is called the query, or query point. The first step in making a prediction with MBL is to look through the database to find all the data points whose input attributes are similar to the query point. In order to do that, we have to define what is meant by similar. We need to define a distance metric that tells how close two points are.

Vizier uses a scaled Euclidean distance metric ( norm). The distance between two points (between their input attributes) is defined by:

where is a diagonal matrix and refers to a vector of input attributes. Other distance metrics include norm (sometimes called Manhattan distance), norm, and Mahalanobis distance (same as scaled Euclidean except that is required to be symmetric, but not necessarily diagonal). Scaled Euclidean distance works well for most cases and we will not discuss the other metrics any further in this tutorial.

Jeff Schneider
Fri Feb 7 18:00:08 EST 1997