Computer Science Thesis Proposal
- Remote Access - Zoom
- Virtual Presentation - ET
- RUOSONG WANG
- Ph.D. Student
- Computer Science Department
- Carnegie Mellon University
Tackling Challenges in Modern Reinforcement Learning: Long Planning Horizon and Large State Space
Modern reinforcement learning (RL) methods have achieved phenomenal success on various applications. However, reinforcement learning problems with large state space and long planning horizon remain challenging due to the excessive sample complexity burden, and our current theoretical understanding is rather limited for such problems. Moreover, there are important settings (including unsupervised RL and general objective functions) that cannot be addressed by the classical RL framework. In this thesis, the goal is to study the above issues to build a better understanding of modern RL methods.
This thesis will be divided into the following three parts:
Part I: RL with Long Planning Horizon. Learning to plan for long horizons is a central challenge in reinforcement learning problems, and a fundamental question here is to understand how the difficulty of the problem scales as the horizon increases. In the first part of this thesis, we show that tabular episodic reinforcement learning is possible with a sample complexity that scales only logarithmically with the planning horizon, and therefore, long horizon RL is no more difficult than short horizon RL, at least in a minimax sense.
Part II: RL with Large State Space. In modern RL methods, function approximation schemes are deployed to deal with the large state space. Empirically, combining various RL function approximation algorithms with neural networks for feature extraction has lead to tremendous successes on various tasks.
However, these methods often require a large amount of samples to learn a good policy, and it is unclear if there are fundamental statistical limits on such methods.
In the second part of this thesis, we study necessary and sufficient conditions on the representation power of the features that ensure sample-efficient reinforcement learning.
Part III: RL in Other Settings. Classical reinforcement learning paradigms aim to maximize the cumulative reward when the agent has access to the reward distributions. Despite being able to formulate a large family of reinforcement learning problems, there are important applications that cannot be casted into this framework. In the third part of this thesis, we study two new settings, the reward-free setting and RL with general objective functions, to generalize the classical RL framework.
Ruslan Salakhutdinov (Chair)
Sham M. Kakade (University of Washington)
Zoom Participation. See announcement.