New Algorithms for Efficient High-dimensional Nonparametric Classification

Ting Liu

Abstract

  This talk is about non-approximate acceleration of high dimensional k nearest neighbor classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties of that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited.

This is applicable to many algorithms in nonparametric statistics, memory-based learning and kernel-based learning. But for clarity, this talk concentrates on pure k-NN classification. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 10^6 dimensions and 10^5 records, and show non-trivial speedups while giving exact answers.


Back to the Main Page

Charles Rosenberg
Last modified: Tue Sep 30 16:25:14 EDT 2003