Abstract:
Parallelization has become a popular mechanism to speed up data classification tasks that deal with large amounts of data. This paper describes a high-level, fine-grained parallel formulation of a decision tree-based classifier for memory-resident datasets on SMPs. We exploit two levels of divide-and-conquer parallelism in the tree builder: at the outer level across the tree nodes, and at the inner level within each tree node. Lightweight Pthreads are used to express this highly irregular and dynamic parallelism in a natural manner. The task of scheduling the threads and balancing the load is left to a space-efficient Pthreads scheduler. Experimental results on large datasets indicate that the space and time performance of the tree builder scales well with both the data size and number of processors.
@techreport{NarlikarTR98,
author = "Girija J. Narlikar",
title = "A Parallel, Multithreaded Decision Tree Builder",
institution = "Computer Science Department, Carnegie Mellon University",
number = "CMU-CS-98-184",
year = 1998,
month = "Dec"
}