Brian Ziebart

Postdoctoral Fellow
Human-Computer Interaction Institute
Carnegie Mellon University

8208 Gates Hillman Center

Email: bzie...@cs.cmu.edu
Picture of Brian Ziebart

I have moved to the Univ. of Illinois-Chicago. Please visit my new homepage.

Overview


I am a postdoctoral fellow at Carnegie Mellon University. I was awarded a PhD from CMU's Machine Learning Department in December 2010.

I am interested in machine learning techniques for structured data and artificial intelligence applications. My recent research focuses on learning and forecasting decisions and strategies in sequential decision and game settings for assistive technology and robotics applications. My dissertation introduced the principle of maximum causal entropy, a general framework for process-conditioned probabilistic learning and prediction (with robust log-loss minimization guarantees).

I investigate learning and prediction of single-agent control with Anind Dey and prediction of multi-agent behavior with Geoff Gordon and Katia Sycara. My graduate studies were advised by Drew Bagnell and Anind Dey. My undergraduate research was supervised by Roy Campbell and Dan Roth at the University of Illinois at Urbana-Champaign.

Research


Theory:

Applications:

Publications


Probabilistic Pointing Target Prediction via Inverse Optimal Control
Brian D. Ziebart, Anind K. Dey, and J. Andrew Bagnell
International Conference on Intelligent User Interfaces (IUI 2012)
[abstract] [pdf] [bibtex]
Abstract

Numerous interaction techniques have been developed that make "virtual" pointing at targets in graphical user interfaces easier than analogous physical pointing tasks by invoking target-based interface modifications. These pointing facilitation techniques crucially depend on methods for estimating the relevance of potential targets. Unfortunately, many of the simple methods employed to date are inaccurate in common settings with many selectable targets in close proximity. In this paper, we bring recent advances in statistical machine learning to bear on this underlying target relevance estimation problem. By framing past target-driven pointing trajectories as approximate solutions to well-studied control problems, we learn the probabilistic dynamics of pointing trajectories that enable more accurate predictions of intended targets.
Bibtex
@inproceedings{ziebart2012probabilistic,
   author = {Brian D. Ziebart and Anind K. Dey and J. Andrew Bagnell},
   title = {Probabilistic Pointing Target Prediction via Inverse Optimal 
   Control},
   year = {2012},
   booktitle = {Proc. of the International Conference on Intelligent User 
   Interfaces} 
}
Best Paper Award Nominee
Factorized Decision Forecasting via Combining Value-based and Reward-based Estimation
Brian D. Ziebart
Allerton Conference on Communication, Control and Computing (Allerton 2011)
[abstract] [pdf] [bibtex]
Abstract

A powerful recent perspective for predicting sequential decisions learns the parameters of decision problems that produce observed behavior as (near) optimal solutions. Under this perspective, behavior is explained in terms of utilities, which can often be defined as functions of state and action features to enable generalization across decision tasks. Two approaches have been proposed from this perspective: estimate a feature-based reward function and recursively compute values from it, or directly estimate a feature-based value function. In this work, we investigate the combination of these two approaches into a single learning task using directed information theory and the principle of maximum entropy. This enables uncovering which type of estimate is most appropriate -- in terms of predictive accuracy and/or computational benefit -- for different portions of the decision space.
Bibtex
@inproceedings{ziebart2011process,
   author = {Brian D. Ziebart},
   title = {Factorized Decision Forecasting via Combining Value-based
   and Reward-based Estimation},
   year = {2011},
   booktitle = {Proc. of the Allerton Conference on Communications,
   Control and Computing}
}
Process-Conditioned Investing with Incomplete Information using Maximum Causal Entropy
Brian D. Ziebart
International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2011)
[abstract] [pdf] [bibtex]
Abstract

Investing to optimally maximize the growth rate of wealth based on sequences of event outcomes has many information-theoretic interpretations. Namely, the mutual information characterizes the benefit of additional side information being available when making investment decisions in settings where the probabilistic relationships between side information and event outcomes are known. Additionally, the relative variant of the principle of maximum entropy provides the optimal investment allocation in the more general setting where the relationships between side information and event outcomes are only partially known. In this paper, we build upon recent work characterizing the growth rates of investment in settings with inter-dependent side information and event outcome sequences. We consider the extension to settings with inter-dependent event outcomes and side information where the probabilistic relationships between side information and event outcomes are only partially known. We introduce the principle of minimum relative causal entropy to obtain the optimal worst-case investment allocations for this setting. We present efficient algorithms for obtaining these investment allocations using convex optimization techniques and dynamic programming that illustrates a close connection to optimal control theory.
Bibtex
@inproceedings{ziebart2011process,
   author = {Brian D. Ziebart},
   title = {Process-Conditioned Investing with Incomplete Information
   using Maximum Causal Entropy},
   year = {2011},
   booktitle = {Proc. of the International Workshop on Bayesian Inference and
   Maximum Entropy Methods in Science and Engineering}
}
Computational Rationalization: The Inverse Equilibrium Problem
Kevin Waugh, Brian D. Ziebart, and J. Andrew Bagnell
International Conference on Machine Learning (ICML 2011).
[abstract] [pdf] [bibtex]
Best Paper Award
(An earlier version appeared in Workshop on Decision Making with Multiple Imperfect Decision Makers at NIPS 2010.)
Abstract

Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the single-agent decision-theoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision problem. These techniques learn a utility function that explains the example behavior and can then be used to accurately predict or imitate future behavior in similar observed or unobserved situations.

In this work, we consider similar tasks in competitive and cooperative multi-agent domains. Here, unlike single-agent settings, a player cannot myopically maximize its reward -- it must speculate on how the other agents may act to influence the game's outcome. Employing the game-theoretic notion of regret and the principle of maximum entropy, we introduce a technique for predicting and generalizing behavior, as well as recovering a reward function in these domains.
Bibtex
@inproceedings{waugh2011computational,
   author = {Kevin Waugh and Brian D. Ziebart and J. Andrew Bagnell},
   title = {Computational Rationalization: The Inverse Equilibrium Problem},
   year = {2011},
   booktitle = {Proc. of the International Conference on Machine Learning} 
}
Maximum Causal Entropy Correlated Equilibria for Markov Games
Brian D. Ziebart, J. Andrew Bagnell, and Anind K. Dey
International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011).
[abstract] [pdf] [bibtex]
(An earlier version appeared in the Interactive Decision Theory and Game Theory Workshop at AAAI 2010.)
Abstract

Motivated by a machine learning perspective -- that game-theoretic equilibria constraints should serve as guidelines for predicting agents' strategies, we introduce maximum causal entropy correlated equilibria (MCECE), a novel solution concept for general-sum Markov games. In line with this perspective, a MCECE strategy profile is a uniquely-defined joint probability distribution over actions for each game state that minimizes the worst-case prediction of agents' actions under log-loss. Equivalently, it maximizes the worst-case growth rate for gambling on the sequences of agents' joint actions under uniform odds. We present a convex optimization technique for obtaining MCECE strategy profiles that resembles value iteration in finite-horizon games. We assess the predictive benefits of our approach by predicting the strategies generated by previously proposed correlated equilibria solution concepts, and compare against those previous approaches on that same prediction task.
Bibtex
@inproceedings{ziebart2011maximum,
   author = {Brian D. Ziebart and J. Andrew Bagnell and Anind K. Dey},
   title = {Maximum Causal Entropy Correlated Equilibria for {M}arkov Games},
   year = {2011},
   booktitle = {Proc. of the International Conference on Autonomous Agents
      and Multiagent Systems}
}
Learning Patterns of Pick-ups and Drop-offs to Support Busy Family Coordination
Scott Davidoff, Brian D. Ziebart, John Zimmerman, and Anind K. Dey
SIG CHI Conference on Human Factors in Computing Systems (CHI 2011).
[abstract] [pdf] [bibtex]
Abstract

Part of being a parent is taking responsibility for arranging and supplying transportation of children between various events. Dual-income parents frequently develop routines to help manage transportation with a minimal amount of attention. On days when families deviate from their routines, effective logistics can often depend on knowledge of the routine location, availability and intentions of other family members. Since most families rarely document their routine activities, making that needed information unavailable, coordination breakdowns are much more likely to occur. To address this problem we demonstrate the feasibility of learning family routines using mobile phone GPS. We describe how we (1) detect pick-ups and drop- offs; (2) predict which parent will perform a future pick-up or drop-off; and (3) infer if a child will be left at an activity. We discuss how these routine models give digital calendars, reminder and location systems new capabilities to help prevent breakdowns, and improve family life.
Bibtex
@inproceedings{davidoff2011learning,
   author = {Scott Davidoff and Brian D. Ziebart and John Zimmerman and 
            Anind K. Dey},
   title = {Learning Patterns of Pick-ups and Drop-offs to Support Busy
            Family Coordination},
   year = {2011},
   booktitle = {Proc. of the SIG CHI Conference on Human Factors in Computing
                Systems} 
}
Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy
Brian D. Ziebart
PhD Thesis. Department of Machine Learning. December 2010.
[abstract] [pdf] [bibtex]
School of Computer Science Distinguished Dissertation Award, Honorable Mention
Abstract

Predicting human behavior from a small amount of training examples is a challenging machine learning problem. In this thesis, we introduce the principle of maximum causal entropy, a general technique for applying information theory to decision-theoretic, game-theoretic, and control settings where relevant information is sequentially revealed over time. This approach guarantees decision-theoretic performance by matching purposeful measures of behavior (Abbeel & Ng, 2004), and/or enforces game-theoretic rationality constraints (Aumann, 1974), while otherwise being as uncertain as possible, which minimizes worst-case predictive log-loss (Grunwald & Dawid, 2003).

We derive probabilistic models for decision, control, and multi-player game settings using this approach. We then develop corresponding algorithms for efficient inference that include relaxations of the Bellman equation (Bellman, 1957), and simple learning algorithms based on convex optimization. We apply the models and algorithms to a number of behavior prediction tasks. Specifically, we present empirical evaluations of the approach in the domains of vehicle route preference modeling using over 100,000 miles of collected taxi driving data, pedestrian motion modeling from weeks of indoor movement data, and robust prediction of game play in stochastic multi-player games.
Bibtex
@phdthesis{ziebart2010modelingB},
   author = {Brian D. Ziebart},
   title = {Modeling Purposeful Adaptive Behavior with the Principle of
            Maximum Causal Entropy},
   year = {2010},
   month = {Dec},
   school = {Machine Learning Department, Carnegie Mellon University}
}
Modeling Interaction via the Principle of Maximum Causal Entropy
Brian D. Ziebart, J. Andrew Bagnell, and Anind K. Dey
International Conference on Machine Learning (ICML 2010).
[abstract] [pdf] [bibtex]
Best Student Paper Award, Runner-Up
(An earlier version appeared in Workshop on Probabilistic Approaches for Robotics and Control at NIPS 2009.)
Abstract

The principle of maximum entropy provides a powerful framework for statistical models of joint, conditional, and marginal distributions. However, there are many important distributions with elements of interaction and feedback where its applicability has not been established. This work presents the principle of maximum causal entropy -- an approach based on causally conditioned probabilities that can appropriately model the availability and influence of sequentially revealed side information. Using this principle, we derive Maximum Causal Entropy Influence Diagrams, a new probabilistic graphical framework for modeling decision making in settings with latent information, sequential interaction, and feedback. We describe the theoretical advantages of this model and demonstrate its applicability for statistically framing inverse optimal control and decision prediction tasks.
Bibtex
@inproceedings{ziebart2010modeling,
   author = {Brian D. Ziebart and J. Andrew Bagnell and Anind K. Dey},
   title = {Modeling Interaction via the Principle of Maximum Causal Entropy}, 
   year = {2010},
   booktitle = {Proc. of the International Conference on Machine Learning},
   pages = {1255--1262}
}
Planning-based Prediction for Pedestrians
Brian D. Ziebart, Nathan Ratliff, Garratt Gallagher, Christoph Mertz, Kevin Peterson, J. Andrew Bagnell, Martial Hebert, A K. Dey, Siddhartha Srinivasa
International Conference on Intelligent Robots and Systems (IROS 2009).
[abstract] [pdf] [bibtex]
Abstract

We present a novel approach for determining robot movements that efficiently accomplish the robot's tasks while not hindering the movements of people within the environment. Our approach models the goal-directed trajectories of pedestrians using maximum entropy inverse optimal control. The advantage of this modeling approach is the generality of its learned cost function to changes in the environment and to entirely different environments. We employ the predictions of this model of pedestrian trajectories in a novel incremental planner and quantitatively show the improvement in hindrance- sensitive robot trajectory planning provided by our approach.
Bibtex
@inproceedings{bziebart2009planning,
   author = {Brian D. Ziebart and Nathan Ratliff and Garratt Gallagher and
             Christoph Mertz and Kevin Peterson and J. Andrew Bagnell and 
             Martial Hebert and Anind K. Dey and Siddhartha Srinivasa},
   title = {Planning-based Prediction for Pedestrians},
   year = {2009},
   booktitle = {Proc. of the International Conference on Intelligent Robotsi
                and Systems}
}
Inverse Optimal Heuristic Control for Imitation Learning
Nathan Ratliff, Brian D. Ziebart, Kevin Peterson, J. Andrew Bagnell, Martial Hebert, Anind K. Dey, Siddhartha Srinivasa
Artificial Intelligence and Statistics (AISTATS 2009).
[abstract] [pdf] [bibtex]
Abstract

One common approach to imitation learning is behavioral cloning (BC), which employs straight- forward supervised learning (i.e., classification) to directly map observations to controls. A second approach is inverse optimal control (IOC), which formalizes the problem of learning sequential decision-making behavior over long horizons as a problem of recovering a utility function that explains observed behavior. This paper presents inverse optimal heuristic control (IOHC), a novel approach to imitation learning that capitalizes on the strengths of both paradigms. It employs long-horizon IOC-style modeling in a low-dimensional space where inference remains tractable, while incorporating an additional descriptive set of BC-style features to guide a higher-dimensional overall action selection. We provide experimental results demonstrating the capabilities of our model on a simple illustrative problem as well as on two real world problems: turn-prediction for taxi drivers, and pedestrian prediction within an office environment.
Bibtex
@inproceedings{ratliff2009inverse,
   author = {Nathan Ratliff and Brian Ziebart and Kevin Peterson and 
             J. Andrew Bagnell and Martial Hebert and Anind K. Dey and
             Siddhartha Srinivasa},
   title = {Inverse Optimal Heuristic Control for Imitation Learning},
   year = {2009},
   booktitle = {Proc. AISTATS},
   pages = {424--431}
}
Human Behavior Modeling with Maximum Entropy Inverse Optimal Control
Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, Anind K. Dey
AAAI Spring Symposium on Human Behavior Modeling. 2009.
[pdf]

Navigate Like a Cabbie: Probabilistic Reasoning from Observed Context-Aware Behavior
Brian D. Ziebart, Andrew Maas, Anind K. Dey, and J. Andrew Bagnell.
International Conference on Ubiquitous Computing (Ubicomp 2008).
[abstract] [pdf] [bibtex]
Abstract

We present PROCAB, an efficient method for Probabilistically Reasoning from Observed Context-Aware Behavior. It models the context-dependent utilities and underlying reasons that people take different actions. The model generalizes to unseen situations and scales to incorporate rich contextual information. We train our model using the route preferences of 25 taxi drivers demonstrated in over 100,000 miles of collected data, and demonstrate the performance of our model by inferring: (1) decision at next intersection, (2) route to known destination, and (3) destination given partially traveled route.
Bibtex
@inproceedings{bziebart2008navigate,
   author = {Brian D. Ziebart and Andrew Maas 
             and J. Andrew Bagnell and Anind K. Dey},
   title = {Navigate Like a Cabbie: Probabilistic Reasoning from 
            Observed Context-Aware Behavior},
   year = {2008},
   booktitle = {Proc. Ubicomp},
   pages = {322--331}
}
Fast Planning for Dynamic Preferences
Brian D. Ziebart, Anind K. Dey, and J. Andrew Bagnell.
International Conference on Automated Planning and Scheduling (ICAPS 2008).
[abstract] [pdf] [bibtex]
Abstract

We present an algorithm that quickly finds optimal plans for unforeseen agent preferences within graph-based planning domains where actions have deterministic outcomes and action costs are linearly parameterized by preference parameters. We focus on vehicle route planning for drivers with personal trade-offs for different types of roads, and specifically on settings where these preferences are not known until planning time. We employ novel bounds (based on the triangle inequality and on the the concavity of the optimal plan cost in the space of preferences) to enable the reuse of previously computed optimal plans that are similar to the new plan preferences. The resulting lower bounds are employed to guide the search for the optimal plan up to 60 times more efficiently than previous methods.
Bibtex
@inproceedings{ziebart2008fast,
   author = {Brian D. Ziebart and J. Andrew Bagnell and Anind K. Dey},
   title = {Fast Planning for Dynamic Preferences},
   year = {2008},
   booktitle = {Proc. of International Conference on Auomated Planning
               and Scheduling},
   pages = {412--419}
}
Maximum Entropy Inverse Reinforcement Learning
Brian D. Ziebart, Andrew Maas, J. Andrew Bagnell, and Anind K. Dey.
AAAI Conference on Artificial Intelligence (AAAI 2008).
[abstract] [pdf] [bibtex]
(An earlier version appeared in Workshop on Robotic Challenges for Machine Learning at NIPS 2007.)
Abstract

Recent research has shown the benefit of framing problems of imitation learning as solutions to Markov Decision Problems. This approach reduces learning to the problem of recovering a utility function that makes the behavior induced by a near-optimal policy closely mimic demonstrated behavior. In this work, we develop a probabilistic approach based on the principle of maximum entropy. Our approach provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods.

We develop our technique in the context of modeling real-world navigation and driving behaviors where collected data is inherently noisy and imperfect. Our probabilistic approach enables modeling of route preferences as well as a powerful new approach to inferring destinations and routes based on partial trajectories.
Bibtex
@inproceedings{ziebart2008maximum,
   author = {Brian D. Ziebart and Andrew Maas 
             and J. Andrew Bagnell and Anind K. Dey},
   title = {Maximum Entropy Inverse Reinforcement Learning},
   year = {2008},
   booktitle = {Proc. AAAI},
   pages = {1433--1438}
}
Learning Selectively Conditioned Forest Structures with Applications to DBNs and Classification
Brian D. Ziebart, Anind K. Dey, and J. Andrew Bagnell.
Uncertainty in Artificial Intelligence (UAI 2007).
[abstract] [pdf] [bibtex]
Abstract

Dealing with uncertainty in Bayesian Network structures using maximum a posteriori (MAP) estimation or Bayesian Model Averaging (BMA) is often intractable due to the superexponential number of possible directed, acyclic graphs. When the prior is decomposable, two classes of graphs where efficient learning can take place are tree-structures, and fixed-orderings with limited in-degree. We show how MAP estimates and BMA for selectively conditioned forests (SCF), a combination of these two classes, can be computed efficiently for ordered sets of variables. We apply SCFs to temporal data to learn Dynamic Bayesian Networks having an intra-timestep forest and inter-timestep limited in-degree structure, improving model accuracy over DBNs without the combination of structures. We also apply SCFs to Bayes Net classification to learn selective forest-augmented Naive Bayes classifiers. We argue that the built-in feature selection of selective augmented Bayes classifiers makes them preferable to similar non-selective classifiers based on empirical evidence.
Bibtex
@inproceedings{bziebart2007learning,
   author = {Brian D. Ziebart and Anind K. Dey and J. Andrew Bagnell},
   title = {Learning Selectively Conditioned Forest Structures with 
            Applications to DBNs and Classification},
   year = {2007},
   booktitle = {Proc. UAI},
   pages = {458--465}
}
Learning Automation Policies for Pervasive Computing Environments
Brian D. Ziebart, Dan Roth, Roy H. Campbell, and Anind K. Dey.
IEEE International Conference on Autonomic Computing (ICAC 2005).
[abstract] [pdf] [bibtex]
Abstract

If current trends in cellular phone technology, personal digital assistants, and wireless networking are indicative of the future, we can expect our environments to contain an abundance of networked computational devices and resources. We envision these devices acting in an orchestrated manner to meet users' needs, pushing the level of interaction away from particular devices and towards interactions with the environment as a whole. Computation will be based not only on input explicitly provided by the user, but also on contextual information passively collected by networked sensing devices. Configuring the desired responses to different situations will need to be easy for users. However, we anticipate that the triggering situations for many desired automation policies will be complex, unforeseen functions of low-level contextual information. This is problematic since users, though easily able to perceive triggering situations, will not be able to define them as functions of the devices' available contextual information, even when such a function (or a close approximation) does exist. In this paper, we present an alternative approach for specifying the automation rules of a pervasive computing environment using machine learning techniques. Using this approach, users generate training data for an automation policy through demonstration, and, after training is completed, a learned function is employed for future automation. This approach enables users to automate the environment based on changes in the environment that are complex, unforeseen combinations of contextual information. We developed our learning service within Gaia, our pervasive computing system, and deployed it within our prototype pervasive computing environment. Using the system, we were able to have users demonstrate how sound and lighting controls should adjust to different applications used within the environment, the users present, and the locations of those users and then automate those demonstrated preferences.
Bibtex
@inproceedings{bziebart2005learning,
   author = {Brian D. Ziebart and Dan Roth and Roy H. Campbell and 
            Anind K. Dey},
   title = {Learning Automation Policies for Pervasive Computing Environments},
   year = {2005},
   booktitle = {Proc. of the International Conference on Autonomic Computing} 
}
Towards a Pervasive Computing Benchmark
Anand Ranganathan, Jalal Al-Muhtadi, Jacob Biehl, Brian Ziebart, Roy H. Campbell, and Brian Bailey.
PerWare '05 Workshop on Support for Pervasive Computing at PerCom 2005.
[pdf]
System Support for Rapid Ubiquitous Computing Application Development and Evaluation
Manuel Roman, Jalal Al-Muhtadi, Brian Ziebart, and Roy H. Campbell.
Systems Support for Ubiquitous Computing Workshop, at UbiComp 2003.
[pdf]

Dynamic Application Composition: Customizing the Behavior of an Active Space
Manuel Roman, Brian Ziebart, and Roy H. Campbell.
IEEE International Conference on Pervasive Computing and Communications (PerCom 2003).
[pdf]
Miscellany
I helped organize the New Developments in Imitation Learning Workshop at ICML 2011.

I am a co-founder of NavPrescience, a CMU spin-off company developing personalization and prediction technologies for next generation GPS devices.

Gates-Hillman Predictive Market: I finished in 1st place (of 210) with 15.09% of the tickets by maximizing the (relative) entropy (i.e., buying low and selling high).

Bid-Tac-Toe: Exponentiated stochastic gradient ascent + logistic regression over pools of (approximate) equilibria strategies = $2500. However, winning a hedge fund's recruitment programming contest + market meltdown ≠ lucrative quant career (yet..?).