Bits, Brains and Behaviours (PG)

Module aims

The course provides both basic and advanced knowledge in reinforcement learning across three core skills: theory, implementation, and evaluation and the underling biology. Students will learn the fundamentals of both tabular reinforcement learning and bio-inspired learning, and will gain experience in designing and implementing these methods for practical applications.

Specifically, students will:

•          Learn the theoretical foundations of reinforcement learning (Markov decision processes & dynamic programming).

•          Gain experience in framing low-dimensional problems and implementing solutions using tabular reinforcement learning.

•         Understand the links between biological brain learning and machine learning and the generation of behaviour from these.

•          Implement and experiment with a range of different reinforcement learning algorithms by implement these algorithms in software (Python or Matlab), and learn how to visualise and evaluate their performance.

Learning outcomes

Knowledge and Understanding

Describe the basic principles of reinforcement systems learning.

Compare and contrast a range of reinforcement learning approaches.

Intellectual Skills

Propose solutions to decision making problems using knowledge of the state-of-the-art.

Calculate mathematical solutions to problems using reinforcement learning theory.

Practical Skills

Translate mathematical concepts into software to solve practical problems by implementing them in software.

Evaluate the performance of a range of methods and propose appropriate improvements.

Transferable Skills

Prepare clear visualisations of complex data to assist with evaluation.

Module syllabus

The course will include:

- Introduction to Reinforcement Learning and its Mathematical and Biological Foundations

- The Markov Decision Process Framework

- Markov Reward Processes

- The Policy

- Markov Decision Processes

- Dynamic Programming

- Model-Free Learning & Control

- Monte-Carlo Learning 

- Temporal Difference Learning

- Biological and psychological principles of reward processing and generation of behaviour.

- Practical examples of reinforcement learning principles and generalisation to more complex settings such as robot control.

Reinforcement learning has a strong practical element and is best appreciated through implementation and evaluation. 


Solid background in linear algebra, probability theory, gradients

Teaching methods

You will be taught over one term using a combination of lectures, group experiences and practical computer labs. Lecture sessions will be made available on Panaopto for review. Group sessions Labs will be based on the practical application of taught content from lectures to complement these topics and allow students to grow their understanding in a student-led manner. Crucially group activities and labs are as much a vehcile for content delivery as the lectures themselves. 


The module will be assessed by the submission of an MCQ-style test, a computer-based coursework which requires coding, and an accompanying group project where students work together to full a common goal disovery or innovation oriented project goal (student-led with teacher input). These three modalities together asses overall performance aginst the LOs.

The courseworks are structured to cover three different core skills: theory, implementation, and evaluation. The exam covers questions of understanding mixed with problem solving questions. It will comprise two questions with subquestions. Question 1 is based on the first half of the course. Question 2 is based on the second half of the course.