Deep Reinforcement Learning and Control
Spring 2017, CMU 10703
Instructors: Katerina Fragkiadaki, Ruslan Satakhutdinov
Lectures: MW, 3:004:20pm, 4401 Gates and Hillman Centers (GHC)
Office Hours:
 Katerina: Thursday 1.302.30pm, 8015 GHC
 Russ: Friday 1.152.15pm, 8017 GHC
Teaching Assistants:
 Devin Schwab: Thursday 23pm, 4225 NSH
 ChunLiang Li: Thursday 12pm, 8F Open study area GHC
 Renato Negrinho: Wednesday 67pm, 8213 GHC
Communication: Piazza is intended for all future announcements, general questions about the course, clarifications about assignments, student questions to each other, discussions about material, and so on. We strongly encourage all students to participate in discussion, ask, and answer questions through Piazza (link).
Acknowledgement: We are grateful to XSEDE and PSC for donating GPU resources to our students for their homework and project development.
Class goals
 Implement and experiment with existing algorithms for learning control policies guided by reinforcement, expert demonstrations or selftrials.
 Evaluate the sample complexity, generalization and generality of these algorithms.
 Be able to understand research papers in the field of robotic learning.
 Try out some ideas/extensions of your own. Particular focus on incorporating true sensory signal from vision or tactile sensing, and exploring the synergy between learning from simulation versus learning from real experience.
Schedule
The following schedule is tentative, it will continuously change based on time constraints and interest of the people in the class. Reading materials and lecture notes will be added as lectures progress.
Date 
Topic (slides) 
Lecturer 
Readings 
1/18 
Introduction 
Katerina 
[1] 
1/23 
Markov decision processes (MDPs), POMDPs 
Katerina 
[SB, Ch 3] 
1/25 
Solving known MDPs: Dynamic Programming 
Katerina 
[SB, Ch 4] 
1/30 
Monte Carlo learning: value function (VF) estimation and optimization 
Russ 
[SB, Ch 5] 
2/1 
Temporal difference learning: VF estimation and optimization, Q learning, SARSA 
Russ 
[SB, Ch 8] 
2/2 
Recitation: OpenAI Gym recitation 
Devin 
2/6 
Planning and learning: Dyna, Monte carlo tree search 
Katerina 
[SB, Ch 8; 2] 
2/8 
VF approximation, MC, TD with VF approximation, Control with VF approximation 
Russ 
[SB, Ch 9] 
2/13 
VF approximation, Deep Learning, Convnets, backpropagation 
Russ 
[GBC, Ch 6] 
2/15 
Deep Learning, Convnets, optimization tricks 
Russ 
[GBC, Ch 9] 
2/20 
Deep Q Learning : Double Q learning, replay memory 
Russ 
2/22,27 
Policy Gradients I,
Policy Gradients II 
Russ 
[GBC, Ch 13] 
2/28 
Recitation: Homework 2 Overview (TensorFlow.org, Keras.io, Bridges User Guide; Code Snippets) 
Devin 
3/1 
Continuous Actions, Variational Autoencoders, multimodal stochastic policies 
Russ 
3/6 
Imitation Learning I: Behavior Cloning, DAGGER, Learning to Search 
Katerina 
[513] 
3/8 
Imitation Learning II: Inverse RL, MaxEnt IRL, Adversarial Imitation Learning 
Katerina 
[1420] 
3/20 
Sidd Srinivasa: Robotic manipulation 
Guest 
3/22 
Optimal control, trajectory optimization 
Katerina 
[21] 
3/27 
Manuela Veloso: Mobile colaborative robotsRoboCUP 
Guest 
3/29 
Imitation learning III: imitating controllers, learning local models, GPS 
Katerina 
[2226] 
4/3 
Chris Atkeson: What (D)RL ignores: State Estimation, Robustness, And Alternative Strategies 
Guest 
4/5 
Endtoend policy optimization through backpropagation 
Katerina 
[2729] 
4/10 
Exploration and Exploitation 
Russ 
[SB, Ch 2] 
4/12 
Hierarchical RL and Tranfer Learning 
Russ 
4/13 
Recitation: Trajectory optimization  iterative LQR(10:0011:30am, 8102 GHC) 
Katerina 
4/17 
Transfer learning(2): Simulation to Real World 
Katerina 
[3037] 
4/19 
Maxim Likhachev: Learning in Planning: Experience Graphs 
Guest 
4/24 
Memory Augmented RL 
Russ 
4/26 
Learning to learn, one shot learning 
Katerina 
[3842] 
Resources
Readings
 [SB] Sutton & Barto, Reinforcement Learning: An Introduction
 [GBC] Goodfellow, Bengio & Courville, Deep Learning
 Smith & Gasser, The Development of Embodied Cognition: Six Lessons from Babies
 Silver, Huang et al., Mastering the Game of Go with Deep Neural Networks and Tree Search
 Houthooft et al., VIME: Variational Information Maximizing Exploration
 Stadie et al., Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
 Bagnell, An Invitation to Imitation
 Nguyen, Imitation Learning with Recurrent Neural Networks
 Bengio et al., Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
 III et al., Searn in Practice
 Bojarski et al., End to End Learning for SelfDriving Cars
 Guo et al., Deep Learning for RealTime Atari Game Play Using Offline MonteCarlo Tree Search Planning
 Rouhollah et al., Learning real manipulation tasks from virtual demonstrations using LSTM
 Ross et al., Learning Monocular Reactive UAV Control in Cluttered Natural Environments
 Ross et al., A Reduction of Imitation Learning and Structured Prediction to NoRegret Online Learning
 Ziebart et al., Navigate Like a Cabbie: Probabilistic Reasoning from Observed ContextAware Behavior
 Abbeel et al., Apprenticeship Learning via Inverse Reinforcement Learning
 Ho et al., ModelFree Imitation Learning with Policy Optimization
 Finn et al., Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
 Ziebart et al., Maximum Entropy Inverse Reinforcement Learning
 Ziebart et al., Human Behavior Modeling with Maximum Entropy Inverse Optimal Control
 Finn et al., Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and EnergyBased Models
 Tassa et al., Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization
 Watter et al., Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
 Levine et al., Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics
 Levine et al., Guided Policy Search
 Levine et al., EndtoEnd Training of Deep Visuomotor Policies
 Kumar et al., Learning Dexterous Manipulation Policies from Experience and Imitation
 Mishra et al., Prediction and Control with Temporal Segment Models
 Lillicrap et al., Continuous control with deep reinforcement learning
 Heess et al., Learning Continuous Control Policies by Stochastic Value Gradients
 Mordatch et al., Combining modelbased policy search with online model learning for control of physical humanoids
 Rajeswaran et al., EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
 Zoph et al., Neural Architecture Search with Reinforcement Learning
 Tzeng et al., Adapting Deep Visuomotor Representations with Weak Pairwise Constraints
 Ganin et al., DomainAdversarial Training of Neural Networks
 Rusu et al., SimtoReal Robot Learning from Pixels with Progressive Nets
 Hanna et al., Grounded Action Transformation for Robot Learning in Simulation
 Christiano et al., Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model
 Xiong et al., Supervised Descent Method and its Applications to Face Alignment
 Duan et al., OneShot Imitation Learning
 Lake et al., Building Machines That Learn and Think Like People
 Andrychowicz et al., Learning to learn by gradient descent by gradient descent
 Finn et al., ModelAgnostic MetaLearning for Fast Adaptation of Deep Networks
General references
Online courses
Assignments and grading
Please write all assignments in LaTeX using the NIPS style
file. (sty
file, tex example)
The course grade is a weighted average of assignments (60%) and an openended final project (40%).
Prerequisites
This course assumes some familiarity with reinforcement learning, numerical optimization, and machine learning. Suggested relevant courses in MLD are 10701 Introduction to Machine Learning, 10807 Topics in Deep Learning, 10725 Convex Optimization, or online equivalent versions of these courses. For an introduction to machine learning and neural networks, see:
Students less familiar with reinforcement learning can warm start with the first chapters of Sutton&Barto and with the first lectures of Dave Silver’s course.
Feedback
We very much appreciate your
feedback. Feel free to remain anonymous, yet always try to be polite.
Web design: Anton Badev
