A Parallel, Multithreaded Decision Tree Builder

Girija Narlikar
CMU-CS-98-184, December 1998

34k compressed postscript

Abstract:

Parallelization has become a popular mechanism to speed up data classification tasks that deal with large amounts of data. This paper describes a high-level, fine-grained parallel formulation of a decision tree-based classifier for memory-resident datasets on SMPs. We exploit two levels of divide-and-conquer parallelism in the tree builder: at the outer level across the tree nodes, and at the inner level within each tree node. Lightweight Pthreads are used to express this highly irregular and dynamic parallelism in a natural manner. The task of scheduling the threads and balancing the load is left to a space-efficient Pthreads scheduler. Experimental results on large datasets indicate that the space and time performance of the tree builder scales well with both the data size and number of processors.


@techreport{NarlikarTR98,
author		= "Girija J. Narlikar",
title		= "A Parallel, Multithreaded Decision Tree Builder",
institution	= "Computer Science Department, Carnegie Mellon University",
number		= "CMU-CS-98-184",
year		= 1998,
month		= "Dec"
}