agr...@cmu.edu

412-657-6926

GHC 9215

412-268-5943

412-268-5943

Alexander Grubb

School of Computer Science

Carnegie Mellon University

5000 Forbes Avenue

Pittsburgh, PA 15213

School of Computer Science

Carnegie Mellon University

5000 Forbes Avenue

Pittsburgh, PA 15213

I'm a fifth-year PhD student in the School of Computer Science at Carnegie Mellon. Along with my advisor Drew Bagnell, I work on extending boosting and ensemble methods to a variety of machine learning applications with an eye towards applications in robotics and vision.

My current research project and thesis work is focused on
the *anytime prediction* problem. Building off of
traditional boosting techniques, we aim to build prediction
algorithms which are capable of efficiently using *any*
test-time computation budget to generate predictions that are
near-optimal with respect to that budget. Unlike other
approaches which trade cost for accuracy and vice-versa, our
anytime approach generates a single predictor which can
dynamically fit any budget. Furthermore, this budget need not
be known apriori at training time.

Previously I was advised by Dave Touretzky. Together we worked on planning, mapping and navigation for educational robotics using Tekkotsu, the software framework that he and his students have developed.

Abstract: We present SpeedBoost, a natural extension of functional gradient descent, for learning anytime predictors, which automatically trade computation time for predictive accuracy by selecting from a set of simpler candidate predictors. These anytime predictors not only generate approximate predictions rapidly, but are capable of using extra resources at prediction time, when available, to improve performance. We also demonstrate how our framework can be used to select weak predictors which target certain subsets of the data, allowing for efficient use of computational resources on difficult examples. We also show that variants of the SpeedBoost algorithm produce predictors which are provably competitive with any possible sequence of weak predictors with the same total complexity.

Abstract: Boosting is a popular way to derive powerful learners from simpler hypothesis classes. Following previous work (Mason et al., 1999; Friedman, 2000) on general boosting frameworks, we analyze gradient-based descent algorithms for boosting with respect to any convex objective and introduce a new measure of weak learner performance into this setting which generalizes existing work. We present the first weak to strong learning guarantees for the existing gradient boosting work for smooth convex objectives, and also demonstrate that this work fails for non-smooth objectives. To address this issue, we present new algorithms which extend this boosting approach to arbitrary convex loss functions and give corresponding weak to strong convergence results. In addition, we demonstrate experimental results that support our analysis and demonstrate the need for the new algorithms we present.

Abstract: Divide-and-conquer is key to building sophisticated learning machines: hard problems are solved by composing a network of modules that solve simpler problems (LeCun et al., 1998; Rohde, 2002; Bradley, 2009). Many such existing systems rely on learning algorithms which are based on simple parametric gradient descent where the parametrization must be predetermined, or more specialized per-application algorithms which are usually ad-hoc and complicated. We present a novel approach for training generic modular networks that uses two existing techniques: the error propagation strategy of backpropagation and more recent research on descent in spaces of functions (Mason et al., 1999; Scholkopf & Smola, 2001). Combining these two methods of optimization gives a simple algorithm for training heterogeneous networks of functional modules using simple gradient propagation mechanics and established learning algorithms. The resulting separation of concerns between learning individual modules and error propagation mechanics eases implementation, enables a larger class of modular learning strategies, and allows per-module control of complexity/regularization. We derive and demonstrate this functional backpropagation and contrast it with traditional gradient descent in parameter space, observing that in our example domain the method is signicantly more robust to local optima.