A Parallel, Multithreaded Decision Tree Builder

Girija Narlikar
CMU-CS-98-184, December 1998

34k compressed postscript


Parallelization has become a popular mechanism to speed up data classification tasks that deal with large amounts of data. This paper describes a high-level, fine-grained parallel formulation of a decision tree-based classifier for memory-resident datasets on SMPs. We exploit two levels of divide-and-conquer parallelism in the tree builder: at the outer level across the tree nodes, and at the inner level within each tree node. Lightweight Pthreads are used to express this highly irregular and dynamic parallelism in a natural manner. The task of scheduling the threads and balancing the load is left to a space-efficient Pthreads scheduler. Experimental results on large datasets indicate that the space and time performance of the tree builder scales well with both the data size and number of processors.

author		= "Girija J. Narlikar",
title		= "A Parallel, Multithreaded Decision Tree Builder",
institution	= "Computer Science Department, Carnegie Mellon University",
number		= "CMU-CS-98-184",
year		= 1998,
month		= "Dec"