Using Tarjan's Red Rule for Fast Dependency Tree Construction

Dan Pelleg

Abstract

  We discuss the problem of generating dependency-trees to model a given set of data. The traditional solution, proposed by Chow and Liu, solves the problem by finding a minimum-spanning tree in the attribute space, and runs in time quadratic in the number of attributes and linear in the number of data-points. We present an improved algorithm that scales better with the size of the data. We begin by proposing a new minimum-spanning tree algorithm. Surprisingly, this algorithm is in fact slower than currently-known algorithms. We then show how to embed it in a framework that considers probabilistic confidence intervals for the edge weights. This leads to a fast dependency-tree implementation. We present experimental results for datasets with hundreds of attributes and millions of points.

Joint work with Andrew Moore.


Back to the Main Page

Charles Rosenberg
Last modified: Thu Mar 14 10:34:21 EST 2002