Nonparametric Density Estimation: Toward Computational Tractability

Alex Gray - joint work with Andrew Moore.


  Density estimation is a core operation of virtually all probabilistic learning methods (as opposed to discriminative methods). Approaches to density estimation can be divided into two principal classes, parametric methods, such as Bayesian networks, and nonparametric methods such as kernel density estimation and smoothing splines. While neither choice should be universally preferred for all situations, a well-known benefit of nonparametric methods is their ability to achieve estimation optimality for ANY input distribution as more data are observed, a property that no model with a parametric assumption can have. The success of nonparametric methods for other problems, e.g. support vector machines for classification and Gaussian processes for regression, has also drawn recent attention to nonparametric learning approaches.

To date, however, despite a wealth of advanced underlying statistical theory, the use of nonparametric methods has been limited by the computational intractibility of nonparametric methods for all but the smallest datasets. This intractability stems from the nonparametric representation of a density in terms of the entire training set, causing the cost of estimation to scale with the training set size - a fact which seems inescapable. The cost is further multiplied greatly by the need for reliably finding the optimal scale for the estimator. In this paper, we present an algorithm for kernel density estimation, the chief nonparametric approach, which for the first time, reduces the cost of estimating the density for a test point to O(1), or *constant* time complexity. Empirically and analytically we show that the method is dramatically faster than previous algorithmic attempts to solve this computational problem, including the FFT and the 'fast-binning' method, in terms of both dataset size and dimensionality. Furthermore, the algorithm provides arbitrarily tight accuracy guarantees, provides anytime convergence, efficiently addresses the computational issue of scale (bandwidth) selection, works for all common kernel choices, and requires no parameter tuning. The algorithm is an instance of a new principle of algorithm design: multi-recursion, or higher-order divide-and-conquer (thesis forthcoming).

Back to the Main Page

Charles Rosenberg
Last modified: Fri Aug 30 23:14:24 EDT 2002