3:30, Fri 17 Nov, WeH 4623 Michael Kearns AT&T Bell Laboratories We analyze the performance of top-down algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are *boosting* algorithms. By this we mean that if the functions used to label the internal nodes of the decision tree can weakly approximate the unknown target function, then the top-down algorithms we study will amplify this weak advantage to build a tree achieving any desired level of accuracy. The bounds we obtain for this amplification show an interesting dependence on the *splitting criterion* function used by the top-down algorithm. More precisely, if the functions used to label the internal nodes have error 1/2 - gamma as approximations to the target function (for constant gamma > 0), then for the splitting criterion used by CART, trees of size exponential in 1/epsilon suffice to drive the error below epsilon; for the splitting criterion used by C4.5, trees of size superpolynomial but subexponential in 1/epsilon suffice; but for a new splitting criterion suggested by our analysis, trees of size polynomial in 1/epsilon suffice. The differing bounds have a natural explanation in terms of concavity properties of the splitting criterion. The primary contribution of this work is in proving that some popular and empirically successful heuristics that are based on first principles meet the criteria of an independently motivated theoretical model. Joint work with Yishay Mansour of Tel Aviv University.