Deep Reinforcement Learning and Control
Fall 2018, CMU 10703

Instructors: Katerina Fragkiadaki, Tom Mitchell
Lectures: MW, 12:00-1:20pm, 4401 Gates and Hillman Centers (GHC)
Office Hours:

Katerina: Tuesday 1.30-2.30pm, 8107 GHC
Tom: Monday 1:20-1:50pm, Wednesday 1:20-1:50pm, In class and just outside the lecture room

Teaching Assistants:

Nicholay Topin: Monday 3pm-4pm, GHC 8123
Aviral Anshu: Tuesday 11am-12pm, 6th floor commons
Aditya Siddhant: Wednesday 5pm-6pm, TBD
Shihui Li: Thursday 10am-11am, GHC 5th floor commons
Siddharth Ancha: Friday 1pm-2pm, GHC 8021
Brynn Edmonds

Communication: Piazza is intended for all future announcements, general questions about the course, clarifications about assignments, student questions to each other, discussions about material, and so on. We strongly encourage all students to participate in discussion, ask, and answer questions through Piazza.

Class goals
Schedule
Resources
Assignments and grading
Prerequisites

Class goals

Implement and experiment with existing algorithms for learning control policies guided by reinforcement, demonstrations and intrinsic curiosity.
Evaluate the sample complexity, generalization and generality of these algorithms.
Be able to understand research papers in the field of robotic learning.
Try out some ideas/extensions on your own. Particular focus on incorporating sensory input from visual sensors.

Prerequisites

The prerequisite for this course is a full semester introductory course in machine learning, such as CMU's 10-401, 10-601, 10-701 or 10-715. If you have passed a similar semester-long course at another university, we accept that. If you have not satisfied this prerequisite courses, we very strongly recommend you take the prerequisite this semester, and take 10-703 next semester.

Schedule

The following schedule is tentative, it will continuously change based on time constraints and interest of the people in the class. Reading materials and lecture notes will be added as lectures progress.

Date	Topic (slides)	Lecturer	Readings
08/27	Introduction	Katerina	[1, SB Ch1]
08/29	Markov decision processes (MDPs), POMDPs, Solving known MDPs: Dynamic Programming	Katerina	[SB, Ch 3]
09/05	Policy iteration, Value iteration, Asynchronous DP	Tom	[SB, Ch 4]
09/10	Monte Carlo learning: value function (VF) estimation and optimization	Tom	[SB, Ch 5]
09/12	Temporal difference learning: VF estimation and optimization, Q learning, SARSA	Tom	[SB, Ch 8]

Resources

Readings

General references

Online courses

Assignments and grading

The course grade is a weighted average of assignments (60%) and a final project (40%). This year the project will be a competition on one of two or three specified topics, e.g., generalization of manipulation trajectories, or learning to navigate in mazes.
Please write all assignments in LaTeX using the NIPS style file. (sty file, tex example)

Deep Reinforcement Learning and Control Fall 2018, CMU 10703