View on GitHub

stats701-winter2021

Theory of Reinforcement Learning

Welcome to STATS 701 WI 2021

This is a special topics course on the Theory of Reinforcement Learning (RL). We will focus on the design and analysis of RL algorithms using tools from regret analysis of online algorithms, concentration inequalities, and stochastic approximation. The “core” of this course will be based on online RL in a finite state-action (often called the “tabular” setting) Markov Decision Process (MDP) and will be taught in traditional lecture style (fully remote due to COVID-19). The “advanced” part of this course will choose topics based on audience interest and will be more discussion based. Students will volunteer to read a paper (or a small group of related papers) and lead its discussion in the class. Topics for the advanced part could include:

We will use the following resources:

The course is aimed at advanced graduate students in statistics, computer science, control theory, operations research, and other related disciplines that study sequential decision making under uncertainty. Prior exposure to RL (or at least MDPs) is strongly recommended. This course has a theoretical/mathematical flavor and therefore mathematical maturity is expected.

Instructor Information

Name: Ambuj Tewari
Zoom Office Hours: By appointment
Email: tewaria@umich.edu

Grading

There will be no graded homeworks or exams. Grades will be determined on the basis of class presentations and a final project.

Schedule

Caveat Lector: The material below is provided with the hope that it might be useful in learning the theory of RL. Please note that the content is rough and unpolished in many places. It has not been subjected to the scrutiny reserved for scholarly publications. Please watch out for errors and inconsistencies!

Lecture No. Date Topic Readings Media
01 Jan 19 Introduction V, Chapter 1 (Introduction)
SB, Chapter 1 (Introduction)
SB, Section 16.2 (Samuel’s Checkers Player)
SB, Section 16.6 (Mastering the Game of Go)
slides

zoom recording
    Tabular Methods    
02 Jan 21 Markov Processes V, Section 8.2 (Markov processes) slides

zoom recording
03 Jan 26 Markov Reward Processes V, Section 8.4 (Contraction Mapping Theorem)
V, Section 2.1 (Markov Reward Processes)
slides

zoom recording
04 Jan 28 MRPs Continued
Markov Decision Processes
V, Section 2.2 (Markov Decision Processes)
SB, Chapter 3 (Finite Markov Decision Processes)
slides

zoom recording
05 Feb 02 MDPs Continued SB, Chapter 4 (Dynamic Programming) slides

zoom recording
06 Feb 04 LP Approach to Solving MDPs
MLE of Markov Processes
Hoeffding’s Inequality
P, Section 6.9 (Linear Programming)
V, Section 8.3 (MLE of Markov Processes)
slides

zoom recording
07 Feb 09 Monte Carlo Methods V, Section 4.1 (Monte-Carlo Methods)
SB, Chapter 5 (Monte Carlo Methods)
slides

zoom recording
08 Feb 11 Temporal Difference Methods V, Section 4.2 (Temporal Difference Methods)
SB, Chapter 6 (Temporal-Difference Learning)
slides

zoom recording
09 Feb 16 Project Discussion    
    Online Learning and Bandits    
10 Feb 18 Online Convex Optimization (OCO) H, Chapter 1 (Introduction)
H, Chapter 2 (First order algorithms for OCO)
slides

zoom recording
11 Feb 23 Experts Problem
Follow the Regularized Leader
Mirror Descent
H, Chapter 5 (Regularization) slides

zoom recording
12 Feb 25 Adversarial Multi-Armed Bandits LS, Chapter 11 (The Exp3 Algorithm) slides

zoom recording
Mar 05 Proposals due Note: Mar 2nd, Mar 4th lectures were cancelled  
13 Mar 09 Stochastic Multi-Armed Bandits LS, Chapter 4 (Stochastic Bandits)
LS, Chapter 6 (The Explore-Then-Commit Algorithm)
LS, Chapter 7 (The Upper Confidence Bound Algorithm)
LS, Chapter 36 (Thompson Sampling)
slides

zoom recording
    Online Learning in MDPs    
14 Mar 11 Optimism Based Algorithms UCRL2 paper slides

zoom recording
15 Mar 16 Posterior/Thompson Sampling PSRL paper slides

zoom recording
16 Mar 18 Adversarial MDPs MDP Experts paper (Journal version)

O-REPS paper
slides

zoom recording
  Mar 23 Well-being break    
  Mar 25 Well-being break    

Student Presentations

Date Group Topic Media
Mar 30 Dhar-Weitze Reinforcement Learning in Enormous Action Spaces slides

zoom recording
Mar 30 Gupta-Li-Li Game Theoretical Foundations for Multi-agent Reinforcement Learning slides

zoom recording
Apr 01 Liu-Qiao-Sun A deep reinforcement learning based long-term recommender system slides

zoom recording
Apr 01 DiGiovanni-Zell Self-Play Compatibility in Multi-Agent Learning slides

zoom recording
Apr 06 Li-Xu Overview of Thompson sampling in RL and BCI applications slides

zoom recording
Apr 06 Jang-Moug Comparison of Safe Reinforcement Learning Algorithms in Safety Gym slides

zoom recording
Apr 08 Park-Yu Real-time Update in Robot Imitation Learning slides

zoom recording
Apr 08 Otero-Leon Patient panel prioritization with finite resources slides

zoom recording
Apr 13 Wang-Zhong Inverse Reinforcement Learning and its application on calibration of human driver models slides

zoom recording
Apr 13 Chandrasekaran-De Statistical Properties of Monte Carlo Estimates in Reinforcment Learning slides

zoom recording
Apr 15 Zhalechian Reinforcement Learning for Parameterized Markov Decision Processes using Posterior Sampling slides

zoom recording
Apr 15 Bull-Li Reinforcement Learning of Optimal Delivery Destination Estimation slides

zoom recording
Apr 20 Chakraborty-Roy High Dimensional Linear Contextual Bandit and Thompson Sampling using Langevin dynamics slides

zoom recording
Apr 20 Li-Shang Intelligent transportation system slides

zoom recording
Apr 27 Reports due