MSc Statistics (Data Science)
On our MSc in Statistics (Data Science stream), you will discover the forefront of statistical tools and methods with a particular focus on statistical machine learning and data science. Please note our curriculum is fine-tuned each year, and we will update the list and content of available modules each year.
Core modules (Autumn term)
Applied Statistics (7.5 ECTS)
The Applied Statistics module focuses on statistical modelling and regression when applied to realistic problems and real data. We will cover the following topics:
- The Normal Linear model (estimation, residuals, residual sum of squares, goodness of fit, hypothesis testing, ANOVA, model comparison);
- Improving Designs and Explanatory Variables (categorical variables and multi-level regression, experimental design, random and mixed effects models);
- Diagnostics and Model Selection and Revision (outliers, leverage, misfit, exploratory and criterion based model selection, Box-Cox transformations, weighted regression);
- Generalised Linear Models (exponential family of distributions, iteratively re-weighted least squares, model selection and diagnostics);
- In addition, we will introduce more advanced topics related to regression including penalised regression and link with related problems in time series, classification, and state space modelling.
Computational Statistics (7.5 ECTS)
The Computational Statistics module introduces and covers computational methods that are key in modern statistics. Topics include:
- Statistical Computing: R programming: data structures, programming constructs, object system, graphics.
- Numerical methods: root finding, numerical integration, optimisation methods such as EM-type algorithms.
- Simulation: generating random variates, Monte Carlo integration.
- Simulation approaches in inference: randomisation and permutation procedures, bootstrap, Markov Chain Monte Carlo.
Fundamentals of Statistical Inference (7.5 ECTS)
The Fundamentals of Statistical Inference module develops the main approaches to statistical inference for point estimation, hypothesis testing and confidence set construction. Focus is on description of the key elements of Bayesian, frequentist and Fisherian inference through development of the central underlying principles of statistical theory. Formal treatment is given of a decision-theoretic formulation of statistical inference. Key elements of Bayesian and frequentist theory are described, focussing on inferential methods deriving from important special classes of parametric problem and application of principles of data reduction. General purpose methods of inference deriving from the principle of maximum likelihood are detailed. Throughout, particular attention is given to evaluation of the comparative properties of competing methods of inference.
Probability for Statistics (7.5 ECTS)
The Probability for Statistics module introduces central concepts of probability theory in a rigorous way. Topics covered include: the elements of a probability space, random variables and vectors, distribution functions, independence of random variable/vectors, a concise review of the Lebesgue-Stieltjes integration theory, expectation, modes of convergence of random variables, law of large numbers, central limit theorems, characteristic functions, conditional probability and expectation. The module also introduces discrete-time Markov chains and their key properties, including the Chapman-Kolmogorov equations, classification of states, recurrence and transience, stationarity, time reversibility, ergodicity. Moreover, a concise overview of Poisson processes, continuous-time Markov chains and Brownian motion will be given.
You will choose from an unparalleled range of innovative modules offered across all of modern statistics to develop your own special interests. Students will need to choose the three Group A modules from this page, and any other elective modules from the Optional group B modules to ensure they take a total of 30-32.5 ECTS worth of elective modules. Students will be restricted to a maximum of two modules each worth 7.5 ECTS. A few of the modules will typically run in the Autumn term as stated.
Group A modules (Spring term)
Core modules term 2
Data Science (5 ECTS)
The Data Science module focuses on techniques and concepts essential for the running and maintenance of professional data science projects. This module will cover appropriate code and project structures, webscraping, handling large data, reproducible work flows, scalability, and exploratory data analysis. The module will also critically explore appropriate uses and misuses of data science, and cover privacy, fairness, and ethical considerations in large-scale data analytics.
Machine Learning (5 ECTS)
The Machine Learning module provides an introduction to Bayesian statistical pattern recognition and machine learning. The module will cover methods for feature extraction, dimensionality reduction, data clustering and pattern classification. State-of-art approaches such as Gaussian processes and exact and approximate inference methods will be introduced. Real-world applications will illustrate how the techniques are applied to real data sets. This module is typically offered together with final-year Mathematics UG students.
Big Data (5 ECTS)
The Big Data module will train statistically minded practitioners in the use of common Big Data tools, with an emphasis on creating and using distributed fault-tolerant statistical methods that execute statistical analytics at scale. The module will provide an introduction to Big Data infrastructure, Hadoop, PySpark, and standard statistical tools using MapReduce functionality. Students will be empowered to perform more complex tasks across distributed Big Data architectures that include classification, clustering, gradient descent, distributed MCMC, sub-sampling based MCMC, and stochastic variational inference. The module will provide a wide range of applications and example code from standard tasks to streaming applications and network modelling.
Optional Group B modules (Spring term)
Optional A options
Bayesian Methods (5 ECTS)
The Bayesian Methods module introduces the fundamental definitions of probability that underly Bayesian inference and then explores the implications of these basic rules for generic statistical tasks. These include parameter inference, model comparison using the marginal likelihood, hypothesis testing, and experimental design. The model will also cover the formulation of inference problems, with a particular focus on hierarchical models and links to more heuristic approaches such as least-squares fitting. Particular emphasis will also be placed on the assignment of probabilities and distributions, including prior distributions for parameter inference, with a focus on information theoretical considerations that lead to the maximum entropy distributions. This module is typically offered together with final-year Mathematics UG students.
Consumer Credit Risk Modelling (7.5 ECTS)
The Consumer Credit Risk Modelling module introduces the theory and applications of credit risk models in retail finance, covering topics from credit scoring and application scoring. Modelling approaches such as logistic regression, survival models, Markov chains, beta regression and hierarchical models will be covered, and applied to problems including profit estimation, risk-based pricing, expected loss, value-at-risk and loss-given default estimation. Practical topics around model evaluation, ROC curves, and probability calibration are also addressed. This module is typically offered in the Autumn term together with final-year Mathematics UG students.
Multivariate Analysis (5 ECTS)
The Multivariate Analysis module is concerned with the theory and analysis of data that has more than one outcome variable at a time, a situation that is ubiquitous across all areas of science. Multiple uses of univariate statistical analysis is insufficient in settings where interdependency between the multiple random variables are of influence and interest. In this module we look at some of the key ideas associated with multivariate analysis. Topics covered include: multivariate notation, the covariance matrix, multivariate characteristic functions, a detailed treatment of the multivariate normal distribution including the maximum likelihood estimators for mean and covariance, the Wishart distribution, Hotelling's T^2 statistic, likelihood ratio tests, principle component analysis, ordinary, partial and multiple correlation, and multivariate discriminant analysis. This module is typically offered together with final-year Mathematics UG students.
Deep Learning (7.5)
The Deep Learning module introduces the building blocks of deep learning models in both supervised and unsupervised contexts. The module focuses on implementing neural networks in the popular deep learning library TensorFlow, covers a wide range of practical skills, and in particular how to design network architectures for specific applications with TensorFlow. The module will culminate in introducing probabilistic deep learning models such as normalising flows and variational autoencoders (VAEs). This module is offered together with the MSc in Machine Learning and Data Science.
Introduction to Statistical Finance (5 ECTS)
The Introduction to Statistical Finance module covers fundamental concepts in financial economics and quantitative finance, and also presents suitable statistical tools that are widely used when analysing financial data. The module covers risk-neutral pricing theory, and introduces risk measures that are widely used in financial risk management such as value at risk and expected shortfall. Particular focus will be given to ARMA-GARCH time series processes that are well suited to describing many of the stylised facts widely overserved in financial data, including non-Gaussian returns and heteroscedasticity. The module will culminate in forecasting methods for financial time series.
Advanced Statistical Finance (5 ECTS)
The Advanced Statistical Finance module focuses on modern statistical methods for analysis of financial data. During the last two decades, the increasing availability of large financial data sets has prompted the development of new statistical and econometric methods that can cope with high-dimensional data, high-frequency observations and extreme values in data, and many aspects of these methods will be covered here. The module will first introduce the basics of extreme value theory, which will be used to develop models and estimation methods for extremes in financial data. The second part of the module will provide a concise introduction to the theory of stochastic integration and Itô calculus, which provide a theoretical foundation for volatility estimation from high-frequency data using the concept of realised variance. The asymptotic properties of realised variance will be elucidated and applied to draw inference on realised volatility. The third part covers recently developed volatility forecasting models that incorporate volatility information from high-frequency data and demonstrates how the performance of such models can be assessed and compared using modern forecast evaluation methods such as the Diebold-Mariano test and the model confidence set. The final part of the module provides an overview of covariance matrix estimation in a high-dimensional setting, motivated by applications to variance-optimal portfolios. The pitfalls of using the standard sample covariance matrix with high-dimensional data are first exemplified. Then it is shown how shrinkage methods can be applied to estimate covariance matrices accurately using high-dimensional data.
The final part of the module provides an overview of covariance matrix estimation in a high-dimensional setting, motivated by applications to variance-optimal portfolios. The pitfalls of using the standard sample covariance matrix with high-dimensional data are first exemplified. Then it is shown how shrinkage methods can be applied to estimate covariance matrices accurately using high-dimensional data.
Biomedical Statistics (5 ECTS)
The Biomedical Statistics module covers state-of-the-art statistical methods for analysing environmental, ecological, or population health data collected in observational longitudinal studies, case-control studies, and clinical trials. The module will cover building complex Bayesian multi-level models, variable selection, model selection, meta-analysis, handling missing data, and estimation of incidence, prevalence and risk factors. Students will learn about the principles of causal inference, the framework of counterfactual variables, placebo effects, and randomisation techniques. The module will also introduce students to the Stan software for Bayesian inference and efficient computational workhorse to all topics covered, and provide students with a wide range of template codes on contemporary, real-world applications.
Statistical Genetics and Bioinformatics (5 ECTS)
Advances in biotechnology are making routine use of DNA sequencing and microarray technology in biomedical research and clinical use a reality. Innovations in the field of Genomics are not only driving new investigations in the understanding of biology and disease but also fuelling rapid developments in computer science, statistics and engineering in order to support the massive information processing requirements. In this module, students will be introduced into the world of Statistical Genetics and Bioinformatics that have become in the last 10-15 years two of the dominant areas of research and application for modern Statistics. In this module we will develop models and tools to understand complex and high-dimensional genetics datasets. This will include statistical and machine learning techniques for: multiple testing, penalised regression, clustering, p-value combination, dimension reduction. The module will cover both Frequentist and Bayesian statistical approaches. In addition to the statistical approaches, the students will be introduced to genome-wide association and expression studies data, next generation sequencing and other OMICS datasets.
Advanced Simulation Methods (5 ECTS)
The Advanced Simulation Methods module will equip students with the skills and training to formulate Monte Carlo methods for sampling from complicated probability distributions defined on a variety of spaces and setups. This techniques are used pervasively in modern statistics. Students will study and implement central techniques such as Importance Sampling, Markov Chain Monte Carlo, and Sequential Monte Carlo. We will consider the underlying principles of each method as well as practical aspects related to implementation, computational cost, and efficiency. By the end of the module, students will be familiar with these sampling methods and will have applied them to popular statistical models.
Survival Models (7.5 ECTS)
The Survival Models module covers statistical techniques central to analyses of lifetime and censored observations. Students will learn core concepts of survival analysis, right and left censored and randomly censored data, and event time distributions. The module will introduce estimation procedures for lifetime distributions, from empirical survival functions to Kaplan-Meier estimates, and the Cox model. Statistical models of transfers between multiple states, and counting process models will also be covered. The statistical concepts will be brought to life on a wide range of actuarial applications as well as medical applications. This module is typically offered together with final-year Mathematics UG students.
Nonparametric Statistics (5 ECTS)
The Nonparametric Statistics module will equip students with inferential tools based on weaker assumptions than conventional parametric methods. The module will cover estimation of the distribution function and its functionals, and illustrate and study fundamental concepts such as the bias-variance trade off in kernel density estimators. Emphasis will be given to a wide range of non-parametric regression methods including kernel methods, local polynomial regression, splines and wavelets, and on understanding their relative merits using common statistical criteria.
Time Series (7.5 ECTS)
The Time Series module provides an introduction to discrete-time stochastic processes for analysis of time series data arising in a wide array of applications in finance, engineering, many physical sciences as well as neuroscience. The module begins with an introduction to fundamental concepts including stationarity, autocorrelation, trend removal, spectral representations, and aliasing. Thereafter the module will cover the general linear process, estimation of mean and autocovariance, spectral estimation via the periodogram, tapering for bias reduction, autoregressive processes and estimation of their parameters, parametric and non-parametric bivariate time series, coherence, and forecasting. This module is typically offered in the Autumn term together with final-year Mathematics UG students.
Stochastic Processes (5 ECTS)
The Stochastic Processes module is concerned with central techniques for describing phenomena that evolve dynamically in a random manner over time. The module will cover continuous-time stochastic processes, Poisson processes and their variants, Brownian motion, martingales, infinitesimal generators and Kolmogorov equations. The module will also provide an introduction to Stochastic Differential Equations, starting from stochastic integrals and leading onto a general definition of diffusion processes through Ito's formula and related results. Throughout, emphasis will be given to key concepts and simulation, including methods for approximate simulations of SDEs, and a presentation of Girsanov’s theorem in the context of importance sampling for diffusions.
Statistics research project
Summer term and Summer months
In the summer, you will develop your MSc in Statistics thesis on an exciting research problem that suits their interests. You will be supervised by one of our world-leading academic experts and typically collaborate with interdisciplinary, multinational or industry partners that draw from the wide expertise across the College in science, technology and medicine.