Generalized N-body Problems
In my thesis, I unified for the first time a large class of problems which I call generalized N-body problems. These problems have the essential structure of the Coulombic N-body problem of computational physics, consisting of decomposable operators (like summation or maximization) on evaluations of certain kernel functions on n-tuples of points in a D-dimensional metric space. All of the problems listed in this column, and many other important problems not listed (I could list at least 50), are examples of such problems.
The first (and very brief) paper discussing much of the big picture was (Gray and Moore, 'N-Body' Problems in Statistical Learning [pdf] [ps], presented orally at NIPS 2000), but is now outdated. The full story is much longer and is still unfolding at a fast pace, so it will eventually appear probably in monograph (focused book) form in the near future. Individual results will also appear in journal form very soon.   Multi-tree Methods
What makes the problem class interesting is that is possible to devise a unified solution approach for this class, as shown in my thesis. It is based on the introduced principle of higher-order divide-and-conquer, or divide-and-conquer over multiple sets (a principle which is actually more general than what is needed here) -- in this case over multiple space-partitioning data structures. I utilize the best data structures for proximity search, which are data-adaptive, unlike the typical structures of computational physics. My way of using such structures yields strict error bounds in non-exact cases. I used this general strategy to develop new algorithms, which I call dual-tree or more generally multi-tree methods. Each method, listed in this column, demonstrates how the same basic algorithmic methodology specializes to each problem. In addition to passing an empirical test versus the best existing methods, most of the algorithms here have been heavily used in significant applications.    Kernel Density Estimation
 ∀ xq ∑r Kh(|| xq - xr ||) Kernel density estimation is the canonical statistical method for the fundamental problem of estimating general probability densities (i.e. without distribution assumptions), but its intractability has limited its use since its introduction in the 1950's. Our method for this problem can be extended to solve local polynomial regression problems. Our dual-tree algorithm, utilizing a simple finite-difference scheme, turns out to be faster than all previous proposals for the same accuracy, including the FFT and the fast gauss transform, dramatically so beyond 2 or 3 dimensions. It works for all common kernel functions, can perform multiple bandwidths simultaneously for faster cross-validation, provides hard error bounds, and requires no parameter tweaking. (Gray and Moore, Nonparametric Density Estimation: Toward Computational Tractability, SIAM Data Mining 2003 [winner of Best Algorithms Paper Prize]. Gray and Moore, Rapid Evaluation of Multiple Density Models [pdf], [ps] AI & Statistics 2003. Final journal version coming very soon.) n-point Correlation Functions
 ∑q ∑r ∑s I∇(xq, xr, xs) The n-point correlation functions form the foundation of spatial statistics, and constitute the principal means by which cosmologists have validated the best existing cosmological models with respect to observations since the 1960's. Their naive computational cost is O(Nn). The n-point correlation functions are also used heavily in statistical physics and other areas of physics. The fractal dimension is isomorphic to the 2-point correlation. Our n-tree algorithm is the first algorithm to dramatically reduce the cost, by up to seven orders of magnitude in practice, while returning the exact answer. This has made higher-order correlations on large modern datasets, which are necessary for cosmological studies, possible for the first time. (First described in the NIPS 2000 paper. Moore et al., Fast Algorithms and Efficient Statistics: n-point Correlation Functions [pdf] [ps], Mining the Sky 2000. Fast Computation of the Pair Correlation and n-point Correlation Functions, Conf. Computational Physics 2004.) We have also developed a Monte Carlo algorithm which is faster for large radii. Final journal version coming very soon. All-Nearest-Neighbors
 ∀ xq argminr || xq - xr || The all-nearest-neighbors problem asks, for each of a set of query points, what is its nearest neighbor among a set of reference points. In the more general bichromatic version, the query set and reference set may be different. Aside from being a classical challenge problem of computational geometry, this is often the real problem of interest in practice, not the single-query problem. Our simple dual-recursive depth-first-search algorithm turns out to be faster than the best previous algorithms in practice, including Vaidya's algorithm and the well-separated pair decomposition. It is exact, solves the more general bichromatic problem, works for general k, and as with all of our algorithms on this page, it works with virtually any space-partitioning tree. (First described in the NIPS 2000 paper. Full experimental results and detailed description will appear in the paper associated with the Proximity Project hopefully sometime in 2005.) Nonparametric Bayes Classification
∀ xq max{ P(C1)ph(xq|C1), P(C2)ph(xq|C2)}
 Nonparametric density estimation used within the Bayes decision framework provides simple nonlinear Bayes-optimal classification without parametric assumptions, and, less known, performance comparable to the state of the art in practice. Our method for this problem can also be used for support vector machine prediction -- however, in that context we only achieved speedups over the naive method of about 2-3. I therefore consider this methodology to be a failure for that problem. Our algorithm for this problem utilizes three trees for a 2-class problem, and two Fibonnaci-heap priority queues. It is exact -- this is the reward for not simply performing one approximate kernel summation for each class. It allows the nonparametric Bayes classifier to scale to massive training and test sets, unlike methods like SVM's, and has been used with high accuracy in a large-scale quasar detection problem and a drug discovery problem. (Though these applications of the algorithm have been published, the algorithm itself has not, and will be submitted in 2005.) Gaussian Process Regression
Φ-1y, Φu
 Gaussian process regression is a Bayesian regression method, related to the method of kriging, whose principal required computation is the inversion of an N×N matrix. Gaussian process regression is an example of a more general situation within statistics/machine learning of a kernel matrix-vector multiplication. In this case it occurs inside Skilling's matrix inversion method. An extension of our algorithm for kernel density estimation can be used for this. A preliminary demonstration of this for a 1-million-point dataset is in (Gray, Fast Kernel Matrix-Vector Multiplication with Application to Gaussian Process Regression [pdf], [ps] CMU tech report 2004). I consider this work incomplete though, because the validation was only done on one dataset and I would like to see the method used by a GP expert in a real application.