3:30, Fri 17 Nov, WeH 4623
Michael Kearns
AT&T Bell Laboratories
We analyze the performance of top-down algorithms for decision tree
learning, such as those employed by the widely used C4.5 and CART
software packages. Our main result is a proof that such algorithms are
*boosting* algorithms. By this we mean that if the functions used to
label the internal nodes of the decision tree can weakly approximate
the unknown target function, then the top-down algorithms we study
will amplify this weak advantage to build a tree achieving any desired
level of accuracy. The bounds we obtain for this amplification show an
interesting dependence on the *splitting criterion* function used by
the top-down algorithm. More precisely, if the functions used to label
the internal nodes have error 1/2 - gamma as approximations to the
target function (for constant gamma > 0), then for the splitting
criterion used by CART, trees of size exponential in 1/epsilon suffice
to drive the error below epsilon; for the splitting criterion used by
C4.5, trees of size superpolynomial but subexponential in 1/epsilon
suffice; but for a new splitting criterion suggested by our analysis,
trees of size polynomial in 1/epsilon suffice. The differing bounds
have a natural explanation in terms of concavity properties of the
splitting criterion.
The primary contribution of this work is in proving that some popular
and empirically successful heuristics that are based on first principles
meet the criteria of an independently motivated theoretical model.
Joint work with Yishay Mansour of Tel Aviv University.