Imperial College London

Dr Dante Kalise

Faculty of Natural SciencesDepartment of Mathematics

Honorary Lecturer
 
 
 
//

Contact

 

d.kalise-balza Website

 
 
//

Location

 

6M29Huxley BuildingSouth Kensington Campus

//

Summary

 

News: as of 3/9/18, the course will begin on 12/10/18 and will continue for 8 sessions until 7/12/18. No lecture on 30/11/18. Please read below the instructions concerning the final project.

Mathematical Foundations of Reinforcement Learning (TCC Course 10/18-12/18)



Course Description: this course concerns multi-stage decision processes in the framework of dynamic programming and the Bellman equation, where optimal policies are synthesized based on both immediate and long-term rewards. However, the computational requirements of dynamic programming techniques can be prohibitive as the policy/state space is overwhelmingly large, the so-called Bellman's curse of dimensionality". In this course we will overcome this difficulty by means of different techniques for the computation of suboptimal solutions to dynamic programming equations. The lectures will address theoretical, algorithmic, and computational aspects of such techniques.

Prerequisites: dome general knowledge on Iterative Methods, Optimisation and Markov Chains can be useful, but not essential.

Sessions:
1. Introduction to Dynamic Programming I: Optimal feedback control and the Bellman equation, Value and Policy Iteration.

Notes Week 1

2. Introduction to Dynamic Programming II: Finite and Infinite Horizon Control, Value and Policy Iteration.

Notes Week 2

3. Neural Networks: basic architectures, training/optimisation. Stochastic Iterative Algorithms: Stochastic Gradient Method, convergence results.

Notes Week 3


4. Optimisation (continuation) and Simulation Methods: Monte Carlo policy evaluation.

Notes Week 4


5. Approximate Dynamic Programming I: introduction and Approximate Policy Iteration.

Notes Week 5


6. Approximate Dynamic Programming II: Approximate Value Iteration.

Notes Week 6


7. Bellman Equation Methods, The Hamilton-Jacobi-Bellman PDE, Dynamic Games.

Notes Week 7


8. An Overview of Deep Reinforcement Learning. A Case Study: playing Pac-man and Tetris with Reinforcement Learning.


Reading List:
[NDP] Neuro-Dynamic Programming, Dimitri P. Bertsekas and John Tsitsiklis, Athena Scientific, 1996.
[RL] Reinforcement Learning: An Introduction, R. Sutton and A. Barto, MIT Press, 2014.
[DRL] Deep Reinforcement Learning: A Brief Survey, K. Arulkumaran, M. P. Deisenroth, M. Brundage, A. A. Bharath, IEEE Signal Processing Magazine 34(6), 2017.

Assessment: individual projects on different theoretical aspects and applications of reinforcement learning. Please let me know by November 30 which topic are you choosing for your report. Submission deadline: December 20, 2018.


Proposed topics:

1. Hamilton-Jacobi Bellman equations in optimal control (2 projects: deterministic/stochastic).

2. Implementing a RL framework for a 2d/3d minimum time problem with obstacles (2 projects).

3. Training a deep neural network with J* values (1 project).

4. A benchmark of different gradient methods (1 project).

5. Temporal difference methods (2 projects).

6. Approximation theory for neural networks (1 project).

7. Applications of your own interest ( \infty projects).