Multiresolution kdtree representations of data clouds
In instancebased learning (aka memorybased or nonparametric learning), predictions are made by querying highdimensional numerical data clouds, and making inferences from local clusters, locally polynomial patches, and local kernel density estimates. When we make a prediction we must gather up the effects induced by each datapoint and combine them statistically. This is so slow as to render instancebased learning impractical for datasets larger than around 10,000 to 100,000 records. Previous researchers had addressed this by either throwing away data (known as "editing the dataset") or by performing range searches over kdtrees. Range searching cannot help, however, in problems where all datapoints have nonzero effect (such as kernel density estimation), or where the ranges cover a large fraction of the data.
We have been working on new algorithms that avoid these problems. It uses a new kind of kdtree in which statistics about regions of space are maintained at many resolutions simultaneously. For fixed kernel width or range size, the cost of the query increases sublogarithmically in the number of datapoints. And, unlike "editing", once built, the same tree can be used for arbitrary and changing distance metrics and kernels. As a result, massive model selection (autonomously picking the best out of thousands of computationally expensive learning algorithms) becomes tractable, and we know of no other instancebased systems that are able to do nearly as much model selection search as ours.
Paper on Multiresolution Kernel Methods
More recent paper on Multiresolution methods for locally weighted polynomial regression
Constanttime "counting": ADtree representations of cached sufficient statistics
The ADtree (alldimensions tree) is a new data structure that can empirically give a 50fold to 1000fold computational speedup for machine learning methods such as Bayes net structure finders, rule learners and feature selectors operating on large datasets. Many machine learning algorithms need to do vast numbers of counting queries (e.g. "How many records have Color=Blue, Nationality=British, Smoker=False, Status=Married, Nose=Big?"). Similarly, many algorithms need to build and test many contingency tables. We are researching methods for which, subject to certain assumptions, the costs of these kinds of operations can be shown to be independent of the number of records in the dataset and loglinear in the number of nonzero entries in the contingency table.
The ADtree is a very sparse data structure to minimize memory use. We can provide analytical worstcase memory and constructioncost bounds for this structure for several models of data distribution. We have empirically demonstrated that tractablysized data structures can be produced for large realworld datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its leaves. We are interested in investigating the possible uses of ADTrees in other machine learning methods, and looking at the merits of ADTrees in comparison with alternative representations such as kdtrees, Rtrees and frequent sets.
Probabilistic reasoning and belief networks
We show how the ADTree can be used to accelerate Bayes net structure finding algorithms, and we are currently exploiting this technology (in which structure searches can now happen at interactive speeds) in a number of new projects.
Fast Association Rule Learning
We have also applied ADTrees to learning association rules quickly.
A new paper on ADTrees applied to association rule algorithms such as CN2.


Decision and Reinforcement Learning 