First-Person Activity Forecasting with Online Inverse Reinforcement Learning

N. Rhinehart, K. Kitani
Carnegie Mellon University
ICCV, October 2017 Oral
Marr Prize (Best Paper) Honorable Mention Award
Arxiv page



Download paper

Abstract

We address the problem of incrementally modeling and forecasting long-term goals of a first-person camera wearer: what the user will do, where they will go, and what goal they are attempting to reach. In contrast to prior work in trajectory forecasting, our algorithm, DARKO, goes further to reason about semantic states (will I pick up an object?), and future goal states that are far both in terms of space and time. DARKO learns and forecasts from first-person visual observations of the user's daily behaviors via an Online Inverse Reinforcement Learning (IRL) approach. Classical IRL discovers only the rewards in a batch setting, whereas DARKO discovers the states, transitions, rewards, and goals of a user from streaming data. Among other results, we show DARKO forecasts goals better than competing methods in both noisy and ideal settings, and our approach is theoretically and empirically no-regret.

Overview Video


Technical Overview

We harness ideas in imitation learning, specifically Maximum Entropy Inverse Reinforcement Learning (MaxEntIRL). The model is continously informed by a person's behaviors as observed by a first-person camera. This model is used to forward simulate the person's possible futures, which yields predictions for 1) what goal the person intends to reach 2) how they will achieve this goal.



The person is localized via a monocular SLAM algorithm, which forms the spatial component of the person's (and environment's) state. The semantic components of state are represented by the relationship between the person and objects of interest. We adopt a straightforward approach to track the person's possession of objects as part of the semantic component of state.

Algorithm

The continuous modelling, state estimation, and forecasting loop comprises our main algorithm, Demonstrating Agent Rewards for K-futures Online: DARKO. This loop is depicted in Algorithm 1 from the paper, reproduced below:



Forecasting extensions

We derive MaxEntIRL extensions for performing inferences over state and action subspaces in our paper, each of which can have important semantic meaning. For example, we can forecast visitation to the subspace "has a bookbag and laptop". See the paper for more examples. We additionally show how this property extends to efficiently predict the expected length of the person's future trajectory, and provide empirical results of this forecasting. This result is summarized below as an excerpt from our paper, which shows that the expected future trajectory length can be computed efficiently as a summation of the expected future state visitation over the entire state space. This result is best understood in context to the derivation as presented in Section 3.5.