Title: 'N-Body' Problems in Statistical Learning Authors: Alexander Gray and Andrew Moore Abstract: We present very fast algorithms for a large class of statistical problems, which we call all-point-pairs problems, or 'N-body'-like problems. These are problems which abstractly require a comparison of each of the N points in a dataset with each other point and would naively be solved using N^2 computations. Such algorithms cannot be applied to practical problems where N is large. There are many common problems in learning/statistics of this form. In this paper we focus on several examples of all-point-pairs problems within nonparametric statistical learning: nearest-neighbor classification, kernel density estimation, outlier detection, and important set of spatial statistics: the two-point correlation, multiple two-point correlation, and the n-point correlation. The methodology presented in this paper is also applicable in principle to fast learning of large-scale parametric models such as radial basis function neural networks, mixtures of Gaussians, hidden Markov models, and Gaussian processes, as well as several other common methods. We'll present five layers of techniques resulting in a general class of simple recursive geometric algorithms. The aforementioned examples will illustrate some of the modifications needed for different statistical methods. We give an asymptotic analysis and we show empirically that this algorithmic framework leads to several orders of magnitude in speedup over the naive computation, even for small datasets. We are aware of no exact algorithms for these problems which are faster either empirically or theoretically. Keywords: nearest-neighbor classification, kernel density estimation, outlier detection, two-point correlation, n-point correlation, nonparametric statistics, machine learning, kd-trees, hyperstructure, cached sufficient statistics, N-body problem, computational geometry, all-nearest-neighbors, range-counting, range-searching.