Deep Reinforcement Learning and Control
Spring 2017, CMU 10703

Instructors: Katerina Fragkiadaki, Ruslan Satakhutdinov
Lectures: MW, 3:00-4:20pm, 4401 Gates and Hillman Centers (GHC)
Office Hours:

Katerina: Thursday 1.30-2.30pm, 8015 GHC
Russ: Friday 1.15-2.15pm, 8017 GHC

Teaching Assistants:

Devin Schwab: Thursday 2-3pm, 4225 NSH
Chun-Liang Li: Thursday 1-2pm, 8F Open study area GHC
Renato Negrinho: Wednesday 6-7pm, 8213 GHC

Communication: Piazza is intended for all future announcements, general questions about the course, clarifications about assignments, student questions to each other, discussions about material, and so on. We strongly encourage all students to participate in discussion, ask, and answer questions through Piazza (link).

Acknowledgement: We are grateful to XSEDE and PSC for donating GPU resources to our students for their homework and project development.

Class goals
Schedule
Resources
Assignments and grading
Prerequisites

Class goals

Implement and experiment with existing algorithms for learning control policies guided by reinforcement, expert demonstrations or self-trials.
Evaluate the sample complexity, generalization and generality of these algorithms.
Be able to understand research papers in the field of robotic learning.
Try out some ideas/extensions of your own. Particular focus on incorporating true sensory signal from vision or tactile sensing, and exploring the synergy between learning from simulation versus learning from real experience.

Schedule

The following schedule is tentative, it will continuously change based on time constraints and interest of the people in the class. Reading materials and lecture notes will be added as lectures progress.

Date	Topic (slides)	Lecturer	Readings
1/18	Introduction	Katerina	[1]
1/23	Markov decision processes (MDPs), POMDPs	Katerina	[SB, Ch 3]
1/25	Solving known MDPs: Dynamic Programming	Katerina	[SB, Ch 4]
1/30	Monte Carlo learning: value function (VF) estimation and optimization	Russ	[SB, Ch 5]
2/1	Temporal difference learning: VF estimation and optimization, Q learning, SARSA	Russ	[SB, Ch 8]
2/2	Recitation: OpenAI Gym recitation	Devin
2/6	Planning and learning: Dyna, Monte carlo tree search	Katerina	[SB, Ch 8; 2]
2/8	VF approximation, MC, TD with VF approximation, Control with VF approximation	Russ	[SB, Ch 9]
2/13	VF approximation, Deep Learning, Convnets, back-propagation	Russ	[GBC, Ch 6]
2/15	Deep Learning, Convnets, optimization tricks	Russ	[GBC, Ch 9]
2/20	Deep Q Learning : Double Q learning, replay memory	Russ
2/22,27	Policy Gradients I, Policy Gradients II	Russ	[GBC, Ch 13]
2/28	Recitation: Homework 2 Overview (TensorFlow.org, Keras.io, Bridges User Guide; Code Snippets)	Devin
3/1	Continuous Actions, Variational Autoencoders, multimodal stochastic policies	Russ
3/6	Imitation Learning I: Behavior Cloning, DAGGER, Learning to Search	Katerina	[5-13]
3/8	Imitation Learning II: Inverse RL, MaxEnt IRL, Adversarial Imitation Learning	Katerina	[14-20]
3/20	Sidd Srinivasa: Robotic manipulation	Guest
3/22	Optimal control, trajectory optimization	Katerina	[21]
3/27	Manuela Veloso: Mobile colaborative robots--RoboCUP	Guest
3/29	Imitation learning III: imitating controllers, learning local models, GPS	Katerina	[22-26]
4/3	Chris Atkeson: What (D)RL ignores: State Estimation, Robustness, And Alternative Strategies	Guest
4/5	End-to-end policy optimization through back-propagation	Katerina	[27-29]
4/10	Exploration and Exploitation	Russ	[SB, Ch 2]
4/12	Hierarchical RL and Tranfer Learning	Russ
4/13	Recitation: Trajectory optimization - iterative LQR(10:00-11:30am, 8102 GHC)	Katerina
4/17	Transfer learning(2): Simulation to Real World	Katerina	[30-37]
4/19	Maxim Likhachev: Learning in Planning: Experience Graphs	Guest
4/24	Memory Augmented RL	Russ
4/26	Learning to learn, one shot learning	Katerina	[38-42]

Resources

Readings

General references

Online courses

Assignments and grading

Please write all assignments in LaTeX using the NIPS style file. (sty file, tex example)

Homework 1 code template, questions, and tex source.
Homework 2 code template, questions
Homework 3 code template, questions

The course grade is a weighted average of assignments (60%) and an open-ended final project (40%).

Prerequisites

This course assumes some familiarity with reinforcement learning, numerical optimization, and machine learning. Suggested relevant courses in MLD are 10701 Introduction to Machine Learning, 10807 Topics in Deep Learning, 10725 Convex Optimization, or online equivalent versions of these courses. For an introduction to machine learning and neural networks, see:

Students less familiar with reinforcement learning can warm start with the first chapters of Sutton&Barto and with the first lectures of Dave Silver’s course.

Feedback

We very much appreciate your feedback. Feel free to remain anonymous, yet always try to be polite.

Web design: Anton Badev

Deep Reinforcement Learning and Control Spring 2017, CMU 10703