Reinforcement Learning Syllabus

06 Feb 2019

Reference Links

CS294 Fall 2017 UC Berkeley

Berkeley bootcamp, Reinforcement learning course lectures by David Silver

A (Long) Peek into Reinforcement Learning

Reinforcement Learning Textbook

Week 1, Feb 4: Markov Decision Processes

Topics: Dynamic Programming (Value iteration, Policy iteration, and Q-learning)

Sutton Chapter 3: Markov Decision Processes and Chapter 4: Dynamic Programming

Deep RL Bootcamp Core Lecture 1 Intro to MDPs and Exact Solution Methods – Pieter Abbeel Video, Slides

Deep RL Bootcamp Core Lecture 2 Sample-based Approximations and Fitted Learning – Rocky Duan Video, Slides

Deep RL Bootcamp Lab 1: Markov Decision Processes. You will implement value iteration, policy iteration, and tabular Q-learning and apply these algorithms to simple environments including tabular maze navigation (FrozenLake) and controlling a simple crawler robot.

CS294 Reinforcement learning introduction – Sergey Levine Video, Slides

CS294 Value functions introduction – Sergey Levine Video, Slides

Introduction to Reinforcement Learning – Joshua Achiam Slides

Week 2, Feb 11 Monte Carlo Methods

Topics: Use Blackjack to implement first-visit or every-visit MC prediction

Sutton, Chapter 5.3: Monte Carlo Methods

CS294 Optimal control and planning – Sergey Levine Video, Slides

Week 3, Feb 18 Imitation Learning with Mujoco

Supervised learning and imitation (Levine) Video, Slides

CS294 Imitation Learning Project

Week 4, Feb 25 Policy Gradients

Topics: TD (Temporal Difference), use Cartpole and Humanoid for Policy Gradients

Sutton Chapter 6: Temporal-Difference Learning

Deep RL Bootcamp Core Lecture 4a Policy Gradients and Actor Critic – Pieter Abbeel Video, Slides

Deep RL Bootcamp Core Lecture 4b Pong from Pixels – Andrej Karpathy Video, Slides

CS294 Policy gradients introduction – Sergey Levine Video, Slides, Policy Gradients Project

Policy Gradient Algorithms – Lilian Weng Blog

Sutton Chapter 13.5: Actor-Critic Methods

CS294 Actor-critic introduction – Sergey Levine Video, Slides

Week 5, Mar 4 Deep Q Learning, DQN, Rainbow

Sutton Chapter 16.5: DQN

Deep RL Bootcamp Core Lecture 3 DQN + Variants – Vlad Mnih Video, [Slides]https://drive.google.com/open?id=0BxXI_RttTZAhVUhpbDhiSUFFNjg)

Deep RL Bootcamp Lab 3: Deep Q-Learning. You will implement the DQN algorithm and apply it to Atari games.

CS294 Neural networks review (Achiam) Video, Slides

CS294 Advanced Q-learning algorithms – Sergey Levine Video, Slides, DQN Project

Week 6, Mar 11 Model-based RL

Deep RL Bootcamp Core Lecture 9 Model-based RL – Chelsea Finn Video, Slides

CS294 Learning dynamical systems from data – Sergey Levine Video, Slides

CS294 Learning policies by imitating optimal controllers – Sergey Levine Video, Slides

CS294 Advanced model learning and images – Chelsea Finn Video, Slides

CS294 Connection between inference and control – Sergey Levine Video, Slides

CS294 Model Based RL Project

Week 7, Mar 18 Advanced Policy Gradients

Topics: Advanced Policy Gradients: Natural Policy, PPO (Use Roboschool instead of Mujoco license)

Deep RL Bootcamp Core Lecture 5 Natural Policy Gradients, TRPO, and PPO – John Schulman Video, Slides

Deep RL Bootcamp Lab 4: Policy Optimization Algorithms. You will implement various policy optimization algorithms, including policy gradient, natural policy gradient, trust-region policy optimization (TRPO), and asynchronous advantage actor-critic (A3C). You will apply these algorithms to classic control tasks, Atari games, and roboschool locomotion environments.

CS294 Learning policies by imitating optimal controllers – Sergey Levine Video, Slides

Week 8, Mar 25 Inverse RL

Topics: GAIL

CS294 Inverse reinforcement learning – Sergey Levine Video, Slides

Algorithms for Inverse Reinforcement Learning PDF

Learning Robust Rewards with Adversarial Inverse Reinforcement Learning PDF

Maximum Entropy Inverse Reinforcement Learning PDF

Maximum Entropy Deep Inverse Reinforcement Learning PDF

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization PDF

Generative Adversarial Imitation Learning PDF