Alexander Grubb
 

email

a...@cs.cmu.edu

mail

Alexander Grubb
School of Computer Science
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213

about me

I was a PhD student in the School of Computer Science at Carnegie Mellon. Along with my advisor Drew Bagnell, I worked on extending boosting and ensemble methods to a variety of machine learning applications with an eye towards applications in robotics and vision. As of 2014, I am now working at Google Pittsburgh.

My main research project and thesis work was focused on the anytime prediction problem. Building off of traditional boosting techniques, we designed prediction algorithms which are capable of efficiently using any test-time computation budget to generate predictions that are near-optimal with respect to that budget. Unlike other approaches which trade cost for accuracy and vice-versa, our anytime approach generates a single predictor which can dynamically fit any budget. Furthermore, this budget need not be known apriori at training time.

Previously I was advised by Dave Touretzky. Together we worked on planning, mapping and navigation for educational robotics using Tekkotsu, the software framework that he and his students have developed.

publications

SpeedBoost: Anytime Prediction with Uniform Near-Optimality

Alexander Grubb and J. Andrew (Drew) Bagnell. Proceedings of the 15th International Conference on Artificial Intelligence and Statistics, April, 2012. [pdf]

Abstract: We present SpeedBoost, a natural extension of functional gradient descent, for learning anytime predictors, which automatically trade computation time for predictive accuracy by selecting from a set of simpler candidate predictors. These anytime predictors not only generate approximate predictions rapidly, but are capable of using extra resources at prediction time, when available, to improve performance. We also demonstrate how our framework can be used to select weak predictors which target certain subsets of the data, allowing for efficient use of computational resources on difficult examples. We also show that variants of the SpeedBoost algorithm produce predictors which are provably competitive with any possible sequence of weak predictors with the same total complexity.

Generalized Boosting Algorithms for Convex Optimization

Alexander Grubb and J. Andrew (Drew) Bagnell Proceedings of the 28th International Conference on Machine Learning, June, 2011. [pdf]

Abstract: Boosting is a popular way to derive powerful learners from simpler hypothesis classes. Following previous work (Mason et al., 1999; Friedman, 2000) on general boosting frameworks, we analyze gradient-based descent algorithms for boosting with respect to any convex objective and introduce a new measure of weak learner performance into this setting which generalizes existing work. We present the first weak to strong learning guarantees for the existing gradient boosting work for smooth convex objectives, and also demonstrate that this work fails for non-smooth objectives. To address this issue, we present new algorithms which extend this boosting approach to arbitrary convex loss functions and give corresponding weak to strong convergence results. In addition, we demonstrate experimental results that support our analysis and demonstrate the need for the new algorithms we present.

Boosted Backpropagation Learning for Training Deep Modular Networks

Alexander Grubb and J. Andrew (Drew) Bagnell. Proceedings of the 27th International Conference on Machine Learning, May, 2010. [pdf] [extended version]

Abstract: Divide-and-conquer is key to building sophisticated learning machines: hard problems are solved by composing a network of modules that solve simpler problems (LeCun et al., 1998; Rohde, 2002; Bradley, 2009). Many such existing systems rely on learning algorithms which are based on simple parametric gradient descent where the parametrization must be predetermined, or more specialized per-application algorithms which are usually ad-hoc and complicated. We present a novel approach for training generic modular networks that uses two existing techniques: the error propagation strategy of backpropagation and more recent research on descent in spaces of functions (Mason et al., 1999; Scholkopf & Smola, 2001). Combining these two methods of optimization gives a simple algorithm for training heterogeneous networks of functional modules using simple gradient propagation mechanics and established learning algorithms. The resulting separation of concerns between learning individual modules and error propagation mechanics eases implementation, enables a larger class of modular learning strategies, and allows per-module control of complexity/regularization. We derive and demonstrate this functional backpropagation and contrast it with traditional gradient descent in parameter space, observing that in our example domain the method is signicantly more robust to local optima.

 
last updated: Aug. 1, 2012