Deep Reinforcement Learning and Control
Fall 2018, CMU 10703
Instructors: Katerina Fragkiadaki, Tom Mitchell
Lectures: MW, 12:00-1:20pm, 4401 Gates and Hillman Centers (GHC)
Office Hours:
- Katerina: Tuesday 1.30-2.30pm, 8107 GHC
- Tom: Monday 1:20-1:50pm, Wednesday 1:20-1:50pm, In class and just outside the lecture room
Teaching Assistants:
- Nicholay Topin: Monday 3pm-4pm, GHC 8123
- Aviral Anshu: Tuesday 11am-12pm, 6th floor commons
- Aditya Siddhant: Wednesday 5pm-6pm, TBD
- Shihui Li: Thursday 10am-11am, GHC 5th floor commons
- Siddharth Ancha: Friday 1pm-2pm, GHC 8021
- Brynn Edmonds
Communication:
Piazza is intended for all future announcements, general questions about the course, clarifications about assignments, student questions to each other, discussions about material, and so on. We strongly encourage all students to participate in discussion, ask, and answer questions through Piazza.
Class goals
- Implement and experiment with existing algorithms for learning control policies guided by reinforcement, demonstrations and intrinsic curiosity.
- Evaluate the sample complexity, generalization and generality of these algorithms.
- Be able to understand research papers in the field of robotic learning.
- Try out some ideas/extensions on your own. Particular focus on incorporating sensory input from visual sensors.
Prerequisites
The prerequisite for this course is a full semester introductory course in machine learning, such as CMU's 10-401, 10-601, 10-701 or 10-715. If you have passed a similar semester-long course at another university, we accept that. If you have not satisfied this prerequisite courses, we very strongly recommend you take the prerequisite this semester, and take 10-703 next semester.
Schedule
The following schedule is tentative, it will continuously change based on time constraints and interest of the people in the class. Reading materials and lecture notes will be added as lectures progress.
| Date |
Topic (slides) |
Lecturer |
Readings |
| 08/27 |
Introduction |
Katerina |
[1, SB Ch1] |
| 08/29 |
Markov decision processes (MDPs), POMDPs, Solving known MDPs: Dynamic Programming |
Katerina |
[SB, Ch 3] |
| 09/05 |
Policy
iteration, Value iteration, Asynchronous DP |
Tom |
[SB, Ch 4] |
| 09/10 |
Monte Carlo learning: value function (VF) estimation and optimization |
Tom |
[SB, Ch 5] |
| 09/12 |
Temporal difference learning: VF estimation and optimization, Q learning, SARSA |
Tom |
[SB, Ch 8] |
Resources
Readings
- [SB] Sutton & Barto, Reinforcement Learning: An Introduction
- [GBC] Goodfellow, Bengio & Courville, Deep Learning
- Smith & Gasser, The Development of Embodied Cognition: Six Lessons from Babies
- Silver, Huang et al., Mastering the Game of Go with Deep Neural Networks and Tree Search
- Houthooft et al., VIME: Variational Information Maximizing Exploration
- Stadie et al., Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
- Bagnell, An Invitation to Imitation
- Nguyen, Imitation Learning with Recurrent Neural Networks
- Bengio et al., Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
- III et al., Searn in Practice
- Bojarski et al., End to End Learning for Self-Driving Cars
- Guo et al., Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
- Rouhollah et al., Learning real manipulation tasks from virtual demonstrations using LSTM
- Ross et al., Learning Monocular Reactive UAV Control in Cluttered Natural Environments
- Ross et al., A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
- Ziebart et al., Navigate Like a Cabbie: Probabilistic Reasoning from Observed Context-Aware Behavior
- Abbeel et al., Apprenticeship Learning via Inverse Reinforcement Learning
- Ho et al., Model-Free Imitation Learning with Policy Optimization
- Finn et al., Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
- Ziebart et al., Maximum Entropy Inverse Reinforcement Learning
- Ziebart et al., Human Behavior Modeling with Maximum Entropy Inverse Optimal Control
- Finn et al., Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
- Tassa et al., Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization
- Watter et al., Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
- Levine et al., Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics
- Levine et al., Guided Policy Search
- Levine et al., End-to-End Training of Deep Visuomotor Policies
- Kumar et al., Learning Dexterous Manipulation Policies from Experience and Imitation
- Mishra et al., Prediction and Control with Temporal Segment Models
- Lillicrap et al., Continuous control with deep reinforcement learning
- Heess et al., Learning Continuous Control Policies by Stochastic Value Gradients
- Mordatch et al., Combining model-based policy search with online model learning for control of physical humanoids
- Rajeswaran et al., EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
- Zoph et al., Neural Architecture Search with Reinforcement Learning
- Tzeng et al., Adapting Deep Visuomotor Representations with Weak Pairwise Constraints
- Ganin et al., Domain-Adversarial Training of Neural Networks
- Rusu et al., Sim-to-Real Robot Learning from Pixels with Progressive Nets
- Hanna et al., Grounded Action Transformation for Robot Learning in Simulation
- Christiano et al., Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model
- Xiong et al., Supervised Descent Method and its Applications to Face Alignment
- Duan et al., One-Shot Imitation Learning
- Lake et al., Building Machines That Learn and Think Like People
- Andrychowicz et al., Learning to learn by gradient descent by gradient descent
- Finn et al., Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
General references
Online courses
Assignments and grading
The course grade is a weighted average of assignments (60%) and a final project (40%). This year the project will be a competition on one of two or three specified topics, e.g., generalization of manipulation trajectories, or learning to navigate in mazes.
Please write all assignments in LaTeX using the NIPS style
file. (sty
file, tex example)
|