a...@cs.cmu.edu

Alexander Grubb

School of Computer Science

Carnegie Mellon University

5000 Forbes Avenue

Pittsburgh, PA 15213

School of Computer Science

Carnegie Mellon University

5000 Forbes Avenue

Pittsburgh, PA 15213

I was a PhD student in the School of Computer Science at Carnegie Mellon. Along with my advisor Drew Bagnell, I worked on extending boosting and ensemble methods to a variety of machine learning applications with an eye towards applications in robotics and vision. As of 2014, I am now working at Google Pittsburgh.

My main research project and thesis work was focused on the *anytime
prediction* problem. Building off of traditional boosting techniques,
we designed prediction algorithms which are capable of efficiently
using *any* test-time computation budget to generate predictions
that are near-optimal with respect to that budget. Unlike other
approaches which trade cost for accuracy and vice-versa, our anytime
approach generates a single predictor which can dynamically fit any
budget. Furthermore, this budget need not be known apriori at training
time.

Previously I was advised by Dave Touretzky. Together we worked on planning, mapping and navigation for educational robotics using Tekkotsu, the software framework that he and his students have developed.

Abstract: We present SpeedBoost, a natural extension of functional gradient descent, for learning anytime predictors, which automatically trade computation time for predictive accuracy by selecting from a set of simpler candidate predictors. These anytime predictors not only generate approximate predictions rapidly, but are capable of using extra resources at prediction time, when available, to improve performance. We also demonstrate how our framework can be used to select weak predictors which target certain subsets of the data, allowing for efficient use of computational resources on difficult examples. We also show that variants of the SpeedBoost algorithm produce predictors which are provably competitive with any possible sequence of weak predictors with the same total complexity.

Abstract: Boosting is a popular way to derive powerful learners from simpler hypothesis classes. Following previous work (Mason et al., 1999; Friedman, 2000) on general boosting frameworks, we analyze gradient-based descent algorithms for boosting with respect to any convex objective and introduce a new measure of weak learner performance into this setting which generalizes existing work. We present the first weak to strong learning guarantees for the existing gradient boosting work for smooth convex objectives, and also demonstrate that this work fails for non-smooth objectives. To address this issue, we present new algorithms which extend this boosting approach to arbitrary convex loss functions and give corresponding weak to strong convergence results. In addition, we demonstrate experimental results that support our analysis and demonstrate the need for the new algorithms we present.

Abstract: Divide-and-conquer is key to building sophisticated learning machines: hard problems are solved by composing a network of modules that solve simpler problems (LeCun et al., 1998; Rohde, 2002; Bradley, 2009). Many such existing systems rely on learning algorithms which are based on simple parametric gradient descent where the parametrization must be predetermined, or more specialized per-application algorithms which are usually ad-hoc and complicated. We present a novel approach for training generic modular networks that uses two existing techniques: the error propagation strategy of backpropagation and more recent research on descent in spaces of functions (Mason et al., 1999; Scholkopf & Smola, 2001). Combining these two methods of optimization gives a simple algorithm for training heterogeneous networks of functional modules using simple gradient propagation mechanics and established learning algorithms. The resulting separation of concerns between learning individual modules and error propagation mechanics eases implementation, enables a larger class of modular learning strategies, and allows per-module control of complexity/regularization. We derive and demonstrate this functional backpropagation and contrast it with traditional gradient descent in parameter space, observing that in our example domain the method is signicantly more robust to local optima.