# MSc Statistics (Applied Statistics)

## Autumn term 2020-21 news

This course will begin on schedule in Autumn and we look forward to seeing new and returning students in person, if travel and visa arrangements allow. If students can’t travel to campus in time for the start of term, the Department has made plans to still provide them a high-quality remote educational experience during the Autumn term.

Teaching will be a combination of on-campus (in-person) and remote learning (online), known as ‘multi-mode’ delivery. The balance in the ‘multi-mode’ offering may be subject to change. We will do our best to provide you increased on-campus teaching and research activities as we progress throughout the year. For more information about multi-mode delivery, the learning experience and the steps we’ll be taking to keep students safe on campus, please see our Covid-19 information for applicants and offer holders page.

We will update our course pages with further details once they are finalised.

## Useful information

This one-year full-time programme provides outstanding training in applied statistics, with a broad range of areas of applications including Statistical Finance and Biostatistics. This course will equip students with a range of transferable skills, including programming, problem-solving, critical thinking, scientific writing, project work and presentation, to enable them to take on prominent roles in a wide array of employment and research sectors.

The programme is split between taught **core** and **optional** modules in the Autumn and Spring terms (66.67% weighting) and a **research project** in the Summer term (33.33% weighting).

## Core modules

**Four core modules are offered in the Autumn term:**

## Core core modules - Autumn term

### Applied Statistics (7.5 ECTS)

The module focuses on statistical modelling and regression when applied to realistic problems and real data. We will cover the following topics:

The Normal Linear model (estimation, residuals, residual sum of squares, goodness of fit, hypothesis testing, ANOVA, model comparison). Improving Designs and Explanatory Variables (categorical variables and multi-level regression, experimental design, random and mixed effects models). Diagnostics and Model Selection and Revision (outliers, leverage, misfit, exploratory and criterion based model selection, Box-Cox transformations, weighted regression), Generalised Linear Models (exponential family of distributions, iteratively re-weighted least squares, model selection and diagnostics). In addition, we will introduce more advanced topics related to regression such as penalised regression and link with related problems in Time series, Classification, and State Space modelling.

### Computational Statistics (7.5 ECTS)

This module covers a number of computational methods that are key in modern statistics. Topics include: Statistical Computing: R programming: data structures, programming constructs, object system, graphics. Numerical methods: root finding, numerical integration, optimisation methods such as EM-type algorithms. Simulation: generating random variates, Monte Carlo integration. Simulation approaches in inference: randomisation and permutation procedures, bootstrap, Markov Chain Monte Carlo.

### Fundamentals of Statistical Inference (7.5 ECTS)

In statistical inference experimental or observational data are modelled as the observed values of random variables, to provide a framework from which inductive conclusions may be drawn about the mechanism giving rise to the data. This is done by supposing that the random variable has an assumed parametric probability distribution: the inference is performed by assessing some aspect of the parameter of the distribution.

This module develops the main approaches to statistical inference for point estimation, hypothesis testing and confidence set construction. Focus is on description of the key elements of Bayesian, frequentist and Fisherian inference through development of the central underlying principles of statistical theory. Formal treatment is given of a decision-theoretic formulation of statistical inference. Key elements of Bayesian and frequentist theory are described, focussing on inferential methods deriving from important special classes of parametric problem and application of principles of data reduction. General purpose methods of inference deriving from the principle of maximum likelihood are detailed. Throughout, particular attention is given to evaluation of the comparative properties of competing methods of inference.

### Probability for Statistics (7.5 ECTS)

The module Probability for Statistics introduces the key concepts of probability theory in a rigorous way. Topics covered include: the elements of a probability space, random variables and vectors, distribution functions, independence of random variable/vectors, a concise review of the Lebesgue-Stieltjes integration theory, expectation, modes of convergence of random variables, law of large numbers, central limit theorems, characteristic functions, conditional probability and expectation.

The second part of the module will introduce discrete-time Markov chains and their key properties, including the Chapman-Kolmogorov equations, classification of states, recurrence and transience, stationarity, time reversibility, ergodicity. Moreover, a concise overview of Poisson processes, continuous-time Markov chains and Brownian motion will be given.

## Optional modules

**Each student will need to choose modules from the following optional modules to reach 30-32.5 taught ECTS in the Spring term in such a way so that the following requirements are fulfilled:**

**At least 15 ECTS from elective groups (A) or (B)**

**AND**

**At least 20 ECTS from elective groups (B) or (C)**

**These modules run in the Spring term unless otherwise stated. **

** **

### Optional modules A

## Optional modules A

### Contemporary Statistical Theory (5 ECTS)

This course aims to give an introduction to key developments in contemporary statistical theory. It describes ideas of: multiple testing, inference under sparsity conditions; parametric higher-order likelihood theory for statistical inference; objective Bayes inference; bootstrap methodology and theory; key concepts and methods of selective inference.

### Multivariate Analysis (5 ECTS)

Multivariate Analysis is concerned with the theory and analysis of data that has more than one outcome variable at a time, a situation that is ubiquitous across all areas of science. Multiple uses of univariate statistical analysis is insufficient in this settings where interdependency between the multiple random variables are of influence and interest. In this module we look at some of the key ideas associated with multivariate analysis. Topics covered include: multivariate notation, the covariance matrix, multivariate characteristic functions, a detailed treatment of the multivariate normal distribution including the maximum likelihood estimators for mean and covariance, the Wishart distribution, Hotelling's T^2 statistic, likelihood ratio tests, principle component analysis, ordinary, partial and multiple correlation, multivariate discriminant analysis.

### Deep Learning with Tensor Flow (5 ECTS)

This module teaches the building blocks of deep learning models, and how to design network architectures for specific applications, in both supervised and unsupervised contexts. It covers practical skills in implementing neural networks in the popular deep learning library TensorFlow. Students will learn how to build, train and evaluate networks using this framework. In the latter part of the module, the focus is on probabilistic deep learning models, such as normalising flows and variational autoencoders (VAEs).

### Graphical Models (5 ECTS)

Graphical models are those probability models whose independence structure is characterised by a graph, the conditional independence graph. In this module we will look at some aspects of graphical modelling for both (a) a vector of random variables, and (b) vector-valued time series. We will look at models and their estimation. Topics covered include: dependence structure and graphical representation; Markov properties for undirected graphs; the conditional independence graph; decomposable models; graphical Gaussian models; model selection; acyclic directed graphical models; global directed Markov property; Bayesian networks; graphical modelling of time series; model selection for time series graphs.

### Advanced Statistical Finance (5 ECTS)

Advanced Statistical Finance focuses on modern statistical methods for analysis of financial data. During the last two decades, the increasing availability of large financial data sets has prompted development of new statistical and econometric methods that can cope with high-dimensional data, high-frequency observations and extreme values in data.

The module will first introduce the basics of extreme value theory, which will be used to develop models and estimation methods for extremes in financial data. The second part of the module will provide a concise introduction to the theory of stochastic integration and Itô calculus, which provide a theoretical foundation for volatility estimation from high-frequency data using the concept of realised variance. The asymptotic properties of realised variance will be elucidated and applied to draw inference on realised volatility.

The third part introduces some recently developed volatility forecasting models that incorporate volatility information from high-frequency data and demonstrates how the performance of such models can be assessed and compared using modern forecast evaluation methods such as the Diebold-Mariano test and the model confidence set.

The final part of the module provides an overview of covariance matrix estimation in a high-dimensional setting, motivated by applications to variance-optimal portfolios. The pitfalls of using the standard sample covariance matrix with high-dimensional data are first exemplified. Then it is shown how shrinkage methods can be applied to estimate covariance matrices accurately using high-dimensional data.

### Statistical Genetics and Bioinformatics (5 ECTS)

Advances in biotechnology are making routine use of DNA sequencing and microarray technology in biomedical research and clinical use a reality. Innovations in the field of Genomics are not only driving new investigations in the understanding of biology and disease but also fuelling rapid developments in computer science, statistics and engineering in order to support the massive information processing requirements. In this module, students will be introduced into the world of Statistical Genetics and Bioinformatics that have become in the last 10-15 years two of the dominant areas of research and application for modern Statistics. In this module we will develop models and tools to understand complex and high-dimensional genetics datasets. This will include statistical and machine learning techniques for: multiple testing, penalised regression, clustering, p-value combination, dimension reduction. The module will cover both Frequentist and Bayesian statistical approaches. In addition to the statistical approaches, the students will be introduced to genome-wide association and expression studies data, next generation sequencing and other OMICS datasets.

### Big Data (5 ECTS)

The emergence of Big Data as a recognised and sought-after technological capability is due to the following factors: the general recognition that data is omnipresent, an asset from which organisations can derive business value; the efficient interconnectivity of sensors, devices, networks, services and consumers, allowing data to be transported with relative ease; the emergence of middleware processing platforms, such as Hadoop, InfoSphere Streams, Accumulo, Storm, Spark, Elastic Search, …, which in general terms, empowers the developer with an ability to efficiently create distributed fault-tolerant applications that execute statistical analytics at scale.

To promote the use of advanced statistical methods within a Big Data environment - an essential requirement if correct conclusions are to be reached - it is necessary for statisticians to utilise Big Data tools when supporting or performing statistical analysis in the modern world. The objective of this module is to train statistically minded practitioners in the use of common Big Data tools, with an emphasis on the use of advanced statistical methods for analysis. The module will focus on the application of statistical methods in the processing platforms Hadoop and Spark. Assessment will be through coursework.

## Optional modules B

### Advanced Simulation Methods (5 ECTS)

Modern problems in Statistics require sampling from complicated probability distributions defined on a variety of spaces and setups. In this course we will visit popular advanced sampling techniques, such as Importance Sampling, Markov Chain Monte Carlo, Sequential Monte Carlo. We will consider the underlying principles of each method as well as practical aspects related to implementation, computational cost and efficiency. By the end of the course the students will be familiar with these sampling methods and will have applied them to popular models, such as Hidden Markov Models, which appear ubiquitous in many scientific disciplines.

### Bayesian Methods (5 ECTS)

This module introduces the fundamental definitions of probability which underly Bayesian inference and then explores the implications of these basic rules for generic statistical tasks. These include parameter inference, model comparison using the marginal likelihood, hypothesis testing, and experimental design. The model will also cover the formulation of inference problems, with a particular focus on hierarchical models and links to more heuristic approaches (e.g., least-squares fitting). Particular emphasis will also be placed on the assignment of probabilities and distributions, including prior distributions for parameter inference, with a focus on information theoretical considerations that lead to the maximum entropy distributions.

### Data Science (5 ECTS)

Data scientific methods are wide in scope, drawing equally from computational statistics and computer science. This course will cover computing with data and reproducible work flows, exploratory data analysis, and data representation. In addition, it will cover the visualization and presentation of data, and science about data science, exploring what data analysts really do, thinking critically about appropriate uses and misuses of data science.

### Machine Learning (5 ECTS)

This module will provide an introduction to Bayesian statistical pattern recognition and machine learning. The lectures will focus on a variety of useful techniques including methods for feature extraction, dimensionality reduction, data clustering and pattern classification. State-of-art approaches such as Gaussian processes and exact and approximate inference methods will be introduced. Real-world applications will illustrate how the techniques are applied to real data sets. Continuous assessment through coursework.

### Introduction to Statistical Finance (5 ECTS)

The module “Introduction to Statistical Finance” introduces fundamental concepts in financial economics and quantitative finance and presents suitable statistical tools which are widely used when analysing financial data. The module will start off with an introduction to risk-neutral pricing theory followed by a short survey on risk measures such as value at risk and expected shortfall which are widely used in financial risk management. Next, an introduction to time series analysis will be given, where the main focus will be on so-called ARMA-GARCH processes. Such processes can describe some of the stylised facts widely overserved in financial data, including non-Gaussian returns and heteroscedasticity. Finally, methods for forecasting financial time series will be introduced.

### Biomedical Statistics (5 ECTS)

The students will be introduced to modern statistical approaches and tests performed when analysing data collected from observational studies, such as case-control studies, longitudinal studies and clinical trial studies. The course will introduce central techniques for modelling and inference in biostatistics, from generalized linear regression models to complex Bayesian multi-level models for clinical, environmental and ecological data. Case examples will illustrate recent theoretical advances in action, covering variable selection, principles of handling missing data, meta-analysis, aspects of causal inference, and the effective design of biostatistical studies. Particular emphasis will be on state-of-the-art computing, introducing students to the R tidyverse environment for data science, techniques for handling big data, and the Stan software for inference.

## Optional modules C

### Survival Models and Actuarial Applications (7.5 ECTS)

Survival models are fundamental to actuarial work, as well as being a key concept in medical statistics. This module will introduce the ideas, placing particular emphasis on actuarial applications. Concepts of survival models, right and left censored and randomly censored data. Estimation procedures for lifetime distributions: empirical survival functions, Kaplan-Meier estimates, Cox model. Statistical models of transfers between multiple states, maximum likelihood estimators. Counting process models.

Actuarial Applications: Life table data and expectation of life. Binomial model of mortality. The Poisson model. Estimation of transition intensities that depend on age. Graduation and testing crude and smoothed estimates for consistency.

For M4S14/M5S14: All of the above and additionally, masters level material to be self-studied (based on master level textbook/research monograph/paper).

### Time Series (7.5 ECTS)

**Please note: this module currently runs in the Autumn term**

Time series analysis is an important area of statistics with applications in finance, engineering and many physical sciences plus areas such as neuroscience in medicine. This module covers introductory ideas in both the time domain and frequency domain areas of the subject. Topics:

Real examples, stationarity, autocovariance sequences, covariance matrices for segments, examples of discrete stationary processes, trend removal and seasonal adjustment, the general linear process, spectral representation, sampling and aliasing, linear filtering, estimation of mean and autocovariance, spectral estimation via the periodogram, tapering for bias reduction, autoregressive processes and estimation of their parameters, parametric and non-parametric bivariate time series, coherence, forecasting.

### Additional optional modules

## Additional optional options

### Algorithmic Trading and Machine Learning (5 ECTS)

**Please note: this module currently runs in the Autumn term**

*The Algorithmic Trading and Machine Learning module is part of the MSc in Mathematics and Finance. Any MSc in Statistics student interested in the module is welcome to attend the lectures. A limited number of MSc in Statistics students will be allowed to take this module for credit towards their degree. Priority will be given to the students following the Statistical Finance stream. The final selection of students allowed to take the module for credit will be decided by both Programme Directors after the student's registration for the January exams. *

The aim of the course is to present a series of cutting-edge topics in the area of “Algorithmic trading” in a unified and systematic fashion. For each of the problems presented, we try to emphasize both the mathematical theory as well as industry applications. The course consists of two main parts: 1) Optimal Execution Problems and 2) Machine Learning in Finance. Optimal execution techniques are particularly relevant for market makers and quantitative brokers whereas machine learning is often used by hedge fund and prop desks to generate trading signals. However machine learning algorithms can be also applied as part of optimal execution tools, for example in order to chose order types or speed of execution. The basic optimal execution problem consists of an agent (e.g. a bank or a broker) who needs to buy or sell a pre-specified number of units of a given asset within a fixed time frame (e.g. an hour, a day, etc). Assuming that the purchase or sale of the asset will have an impact on its price, what is the execution policy which minimizes market impact? Having decided on the execution schedule, what type of order (market or limit order) is better to submit? The first problem can be formulated as a trade-off between the expected execution cost and the price risk due to exogenous factors. We shall solve the optimization problem for different types of

- Price dynamics (ABM vs GBM, with drift or without drift);
- Market impact type (temporary, transient, permanent);
- Exogenous Risk functions (variance, VaR).

Machine learning techniques are becoming increasingly popular in the financial industry. They are typically used to help predict asset price patterns, volatility regimes, etc. The course starts by formalizing the concept of “learning” and providing an overview of various learning techniques. The subsequent lectures analyze in detail some of the most popular machine learning algorithms such as neutral networks and support vector machines. We then introduce various smoothing tools (kernel regression, wavelets, HHTs) which have historically been developed for signal processing applications but have found their way into finance over the last few years. Those methods can be used as stand alone or jointly with other learning algorithms, e.g. SVM. Finally, we shall analyze issues related to model selection and how to combine different models to improve the learning outcome. Trading applications using real market data will be presented during the course.