Special Machine Learning / Statistics Seminar


Contextual MDPs: A New Model and Analysis for Reinforcement Learning with Rich Observations

State-of-the-art reinforcement learning systems cannot solve hard problems because they lack systematic exploration mechanisms, instead relying on variants of uniform exploration to collect diverse experience. Systematic exploration mechanism exist for Markov Decision Processes (MDPs) but have sample complexity that scales polynomially with the number of unique observations, making them intractable for modern reinforcement learning applications where observations come from a visual sensor. Are there reinforcement learning algorithms that can effectively handle rich (high-dimensional, infinite) observations by engaging in systematic exploration? 

To aid in the development of such an algorithm, I will first describe a new model for reinforcement learning with rich observations that we call the Contextual-MDP. This model generalizes both stochastic contextual bandits and MDPs, but is considerably more tractable than Partially Observable Markov Decision Processes (POMDPs). I will then describe a new algorithm for learning optimal behavior in Contextual-MDPs. This algorithm engages in global exploration while using a function class to approximate future performance and has a sample complexity that scales polynomially in all relevant parameters while being independent of the number of unique observations. This represents an exponential improvement on all existing approaches and provides theoretical justification for reinforcement learning with function approximation.

Akshay Krishnamurthy is a postdoctoral researcher at Microsoft Research, New York City.  In June 2015, he completed his PhD in the Computer Science Department at Carnegie Mellon University, advised by Aarti Singh.  His research focuses on interactive learning and learning settings involving feedback-driven data collection, including reinforcement learning, interactive solutions for discovering low-dimensional structure, and complex prediction problems with limited feedback.  He has also worked on problems in nonparametric statistics, anomaly detection, and compressive sensing.  Starting Fall 2016, he will join the Department of Computer Science at the University of Massachusetts, Amherst as an assistant professor.

Faculty Hosts: Aarti Singh (ML), Sivaraman Balakrishnan (STAT)

For More Information, Please Contact: