next up previous contents
Next: Near neighbors Up: Memory Based Learning Previous: Memory Based Learning

A distance metric

The set of input attributes, for which we want to make a prediction about the resulting output attributes, is called the query, or query point. The first step in making a prediction with MBL is to look through the database to find all the data points whose input attributes are similar to the query point. In order to do that, we have to define what is meant by similar. We need to define a distance metric that tells how close two points are.

Vizier uses a scaled Euclidean distance metric ( tex2html_wrap_inline1414 norm). The distance between two points (between their input attributes) is defined by:


where tex2html_wrap_inline1416 is a diagonal matrix and tex2html_wrap_inline1418 refers to a vector of input attributes. Other distance metrics include tex2html_wrap_inline1420 norm (sometimes called Manhattan distance), tex2html_wrap_inline1422 norm, and Mahalanobis distance (same as scaled Euclidean except that tex2html_wrap_inline1424 is required to be symmetric, but not necessarily diagonal). Scaled Euclidean distance works well for most cases and we will not discuss the other metrics any further in this tutorial.

Jeff Schneider
Fri Feb 7 18:00:08 EST 1997