Navigation & Links

About Me

Hi, I'm Nick Rhinehart, a Ph.D student at the CMU Robotics Institute. I'm interested in building decision-theoretic models that leverage rich perception sources to drive activity forecasting, functional understanding, general prediction, and general control tasks. A key question to this work is "How can we build, interpret, and quantify models that reason about the future?"

My research interests include forward and inverse reinforcement learning, imitation learning, activity analysis, egocentric vision, and visual recognition topics. I work primarily with Kris Kitani and Drew Bagnell.


  • New September 2017: New pre-print out on N2N Learning — RL for NN compression.
  • New September 2017: Predictive-State Decoders accepted to NIPS 2017.
  • New August 2017: Predictive-State Decoders accepted to inaugural CoRL 2017 conference.
  • New July 2017: First-Person Forecasting accepted as an ICCV 2017 Oral (2% acceptance rate).
  • New June 2017: Summer research with Paul Vernaza and Manmohan Chandraker at NECLA.
  • December 2016: A pre-print of First-Person Forecasting is on arXiv.
  • March 2016: New paper, Action Maps, at CVPR 2016.
  • Summer 2016: R&D at the Uber Advanced Technologies Center.
  • Summer 2015: I visited & collaborated with the Sato Laboratory at the University of Tokyo.
  • May 2015: New paper, Visual Chunking, at ICRA 2015.
  • Fall 2014: NIPS 2014 workshop presentation.


N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning
A. Ashok, N. Rhinehart, F. Beainy, K. Kitani
[Abstract] [BibTeX] [arXiv]

Abstract: While bigger and deeper neural network architectures continue to advance the state-of-the-art for many computer vision tasks, real-world adoption of these networks is impeded by hardware and speed constraints. Conventional model compression methods attempt to address this problem by modifying the architecture manually or using pre-defined heuristics. Since the space of all reduced architectures is very large, modifying the architecture of a deep neural network in this way is a difficult task. In this paper, we tackle this issue by introducing a principled method for learning reduced network architectures in a data-driven way using reinforcement learning. Our approach takes a larger `teacher' network as input and outputs a compressed `student' network derived from the `teacher' network. In the first stage of our method, a recurrent policy network aggressively removes layers from the large `teacher' model. In the second stage, another recurrent policy network carefully reduces the size of each remaining layer. The resulting network is then evaluated to obtain a reward -- a score based on the accuracy and compression of the network. Our approach uses this reward signal with policy gradients to train the policies to find a locally optimal student network. Our experiments show that we can achieve compression rates of more than 10x for models such as ResNet-34 while maintaining similar performance to the input `teacher' network. We also present a valuable transfer learning result which shows that policies which are pre-trained on smaller `teacher' networks can be used to rapidly speed up training on larger `teacher' networks.

  title={N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning},
  author={Ashok, Anubhav and Rhinehart, Nicholas and Beainy, Fares and Kitani, Kris M},
  journal={arXiv preprint arXiv:1709.06030},

Refereed Research

Predictive-State Decoders: Encoding the Future Into Recurrent Neural Networks
N. Rhinehart*, A. Venkataraman*, W. Sun, L. Pinto, M. Hebert, B. Boots, K. Kitani, J. A. Bagnell
To appear at NIPS 2017 and CoRL 2017
[Abstract] [BibTeX] [arXiv]

Abstract: Recurrent neural networks (RNNs) are a vital modeling technique that rely on internal states learned indirectly by optimization of a supervised, unsupervised, or reinforcement training loss. RNNs are used to model dynamic processes that are characterized by underlying latent states whose form is often unknown, precluding its analytic representation inside an RNN. In the Predictive-State Representation (PSR) literature, latent state processes are modeled by an internal state representation that directly models the distribution of future observations, and most recent work in this area has relied on explicitly representing and targeting sufficient statistics of this probability distribution. We seek to combine the advantages of RNNs and PSRs by augmenting existing state-of-the-art recurrent neural networks with Predictive-State Decoders (PSDs), which add supervision to the network's internal state representation to target predicting future observations. PSDs are simple to implement and easily incorporated into existing training pipelines via additional loss regularization. We demonstrate the effectiveness of PSDs with experimental results in three different domains: probabilistic filtering, Imitation Learning, and Reinforcement Learning. In each, our method improves statistical performance of state-of-the-art recurrent baselines and does so with fewer iterations and less data.

  title={Predictive-State Decoders: Encoding the Future into Recurrent Networks},
  author={Venkatraman, Arun and Rhinehart, Nicholas and Sun, Wen and Pinto, Lerrel and Hebert, Martial and Boots, Byron and Kitani, Kris M and Bagnell, J Andrew},
  journal={arXiv preprint arXiv:1709.08520},

First-Person Forecasting with Online Inverse Reinforcement Learning
To Appear at ICCV 2017 (Oral, 2% acceptance rate)
N. Rhinehart, K. Kitani
[Abstract] [BibTeX] [Project page] [.pdf] [arXiv] [Video]

Abstract: We address the problem of incrementally modeling and forecasting long-term goals of a first-person camera wearer: what the user will do, where they will go, and what goal they are attempting to reach. In contrast to prior work in trajectory forecasting, our algorithm, DARKO, goes further to reason about semantic states (will I pick up an object?), and future goal states that are far both in terms of space and time. DARKO learns and forecasts from first-person visual observations of the user's daily behaviors via an Online Inverse Reinforcement Learning (IRL) approach. Classical IRL discovers only the rewards in a batch setting, whereas DARKO discovers the states, transitions, rewards, and goals of a user from streaming data. Among other results, we show DARKO forecasts goals better than competing methods in both noisy and ideal settings, and our approach is theoretically and empirically no-regret.

author = {Rhinehart, Nicholas and Kitani, Kris M.},
title = {First-Person Activity Forecasting With Online Inverse Reinforcement Learning},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}

Learning Action Maps of Large Environments Via First-Person Vision
N. Rhinehart, K. Kitani
CVPR, 2016; MACV 2016
[Abstract] [BibTeX] [.pdf] [Slides (.pdf)] [arXiv] [ieee]

Abstract: When people observe and interact with physical spaces, they are able to associate functionality to regions in the environment. Our goal is to automate dense functional understanding of large spaces by leveraging sparse activity demonstrations recorded from an ego-centric viewpoint. The method we describe enables functionality estimation in large scenes where people have behaved, as well as novel scenes where no behaviors are observed. Our method learns and predicts "Action Maps", which encode the ability for a user to perform activities at various locations. With the usage of an egocentric camera to observe human activities, our method scales with the size of the scene without the need for mounting multiple static surveillance cameras and is well-suited to the task of observing activities up-close. We demonstrate that by capturing appearance-based attributes of the environment and associating these attributes with activity demonstrations, our proposed mathematical frame- work allows for the prediction of Action Maps in new environments. Additionally, we offer a preliminary glance of the applicability of Action Maps by demonstrating a proof-of-concept application in which they are used in concert with activity detections to perform localization.

		  author = {Rhinehart, Nicholas and Kitani, Kris M.},
		  title = {Learning Action Maps of Large Environments via First-Person Vision},
		  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
		  month = {June},
		  year = {2016}

Visual Chunking: A List Prediction Framework for Region-Based Object Detection
N. Rhinehart, J. Zhou, M. Hebert, J. A. Bagnell
ICRA, 2015
[Abstract] [BibTeX] [.pdf] [Poster (.key)] [Poster (.pdf)] [Youtube]

Abstract: We consider detecting objects in an image by iteratively selecting from a set of arbitrarily shaped candidate regions. Our generic approach, which we term visual chunking, reasons about the locations of multiple object instances in an image while expressively describing object boundaries. We design an optimization criterion for measuring the performance of a list of such detections as a natural extension to a common per-instance metric. We present an efficient algorithm with provable performance for building a high-quality list of detections from any candidate set of region-based proposals. We also develop a simple class-specific algorithm to generate a candidate region instance in near-linear time in the number of low-level superpixels that outperforms other region generating methods. In order to make predictions on novel images at testing time without access to ground truth, we develop learning approaches to emulate these algorithms' behaviors. We demonstrate that our new approach outperforms sophisticated baselines on benchmark datasets.

  title={Visual chunking: A list prediction framework for region-based object detection},
  author={Rhinehart, Nicholas and Zhou, Jiaji and Hebert, Martial and Bagnell, J Andrew},
  booktitle={Robotics and Automation (ICRA), 2015 IEEE International Conference on},

Fine-Grained Detection via Efficient Extreme Classification
N. Rhinehart, J. Zhou, M. Hebert, J. A. Bagnell
NIPS 2014, Presentation at Workshop on Extreme Classification.
[Abstract] [BibTeX] [Poster (.pdf) ] [Poster (.pptx)]


  title={Visual chunking: A list prediction framework for region-based object detection},
  author={Rhinehart, Nicholas and Zhou, Jiaji and Hebert, Martial and Bagnell, J Andrew},
  booktitle={Robotics and Automation (ICRA), 2015 IEEE International Conference on},

Unrefereed Research

Flight Autonomy in Obstacle-Dense Environments
N. Rhinehart, D. Dey, J. A. Bagnell
Robotics Institute Summer Scholars Symposium, August 2011;
Sigma-Xi Research Symposium, October, 2011
[Poster (.pdf)] [Youtube]

Autonomous Localization and Navigation of Humanoid Robot
N. Rhinehart, M. Zucker
Swarthmore College Senior Thesis Project, May, 2012

Other Unrefereed Projects

Fast SFM-Based Localization of Temporal Sequences and Ground-Plane Hypothesis Consensus
Project for 16-822 Geometry Based Methods in Computer Vision, May, 2015
[.pdf] [Video (.mp4)]

Online Anomaly Detection in Video
Project for 16-831 Statistical Techniques in Robotics, December, 2014

Miscellaneous Projects

Miscellaneous old projects

© Nick Rhinehart