# Optional Courses

**(Lecturers in 2016/17 are in brackets)**

## Optional courses 2016/2017

### M5A44 Computational Stochastic Processes

(Prof Pavliotis) Stochastic processes play an increasingly important role in the model l in g of physical, chemical and biological systems. Most stochast ic mathematical models are analytically intractable and have to be simulated on a computer. Th is course will introduce basic numerical and computational techniques for simulating stochastic processes and will present applications to specific physical problems. Contents include:

- Simulation of Brownian motion, Brownian bridge, geometric Brownian motion. Simulation of random fields, The Karhunen-Loeve expansion.
- Numerical methods for stochastic differential equations, weak and strong convergence, stability, numerical simulation of ergodic SDEs.
- Backward/forward Kolmogorov equations. Numerical methods for parabolic PDEs (finite difference, spectral methods). Calculation of the transition probability density and of the invariant measure for ergodic diffusion processes.
- Statistical inference for diffusion processes, maximum likelihood, method moments. Markov Chain Monte Carlo, sampling from probability distributions

Applications: computational statistical mechanics, molecular dynamics

*Prerequisites: Some knowledge of stochastic processes, ODEs, PDEs, linear algebra, scientific computing, numerical analysis will be useful. Knowledge of Matlab or any other programming language.*

### M5MS05 Advanced Statistical Theory

(Prof A Young) This course aims to give an introduction to key developments in contemporary statistical theory, building on ideas developed in the core course Fundamentals of Statistical Inference. Reasons for wishing to extend the techniques discussed in that course are several. Optimal procedures of inference, as described, say, by Neyman-Pearson theory, may only be tractable in unrealistically simple statistical models. Distributional approximations, such as those provided by asymptotic likelihood theory, may be judged to be inadequate, especially when confronted with small data samples (as often arise in various fields, such as particle physics and in examination of operational loss in financial systems). It may be desirable to develop general purpose inference methods, such as those given by likelihood theory, to explicitly incorporate ideas of appropriate conditioning. In many settings, such as bioinformatics, we are confronted with the need to simultaneously test many hypotheses. More generally, we may be confronted with problems where the dimensionality of the parameter of the model increases with sample size, rather than remaining fixed.

We consider here a number of topics motivated by such considerations. Focus will be on developments in likelihood-based inference, but we will give consideration too to: problems of multiple testing, objective Bayes methods, bootstrap alternatives to analytic distributional approximation, and introduce too more theoretical notions involved in high-dimensional inference.

### M5MS06 Bayesian Data Analysis

(Dr D Mortlock) Scientific inquiry is an iterative process of integrating and accumulating information. Investigators assess the current state of knowledge regarding the issue of interest, gather new data to address remaining questions, and then update and refine their understanding to incorporate both new and old data. Bayesian inference provides a logical, quantitative framework for this process. This framework is based fundamentally on the familiar theorem from basic probability theory known as Bayes' Theorem.

Although Bayesian statistical methods have long been of theoretical interest, the relatively recent advent of sophisticated computational methods such as Markov chain Monte Carlo has catapulted Bayesian methods to centre stage. We can now routinely fit Bayesian models that are custom made to describe the idiosyncratic complexities of particular data streams in a multitude of scientific, technological, and policy settings. The ability to easily develop customized statistical techniques has revolutionized our ability to handle complex data. Bayesian methods now play an important role in the analysis of data from Marketing and sales, online activity, security camera streams, transportation, climate and weather, medical records, bioinformatics, and a myriad of other human activities and physical processes. In this course we will develop tools for designing, fitting, validating, and comparing the highly structured Bayesian models that are so quickly transforming how scientists, researchers, and statisticians approach their data.

### M5MS07 Non-parametric Smoothing and Wavelets

(Dr Cohen) Non-parametric methods, as opposed to parametric methods, are desirable when we can not confidently assume parametric models for our observations. In such situations we need flexible, data driven methods for estimating distributions or performing regression. This course looks at a number of non-parametric methods. These will include:

Non-parametric density estimation: histograms, kernel estimators, window width, adaptive kernel estimators.

Non-parametric regression: regressograms, kernel regression, local polynomial regression, cross-validation.

Regularisation and Spline Smoothing: roughness penalty, cubic splines, spline smoothing, Reinsch algorithm.

Basis function approach: B-spines, wavelets: discrete wavelet transform; wavelet variance, wavelet shrinkage, thresholding.

### M5MS08 Multivariate Analysis

(Dr Cohen) As the name indicates, multivariate analysis comprises a set of techniques dedicated to the analysis of data sets with more than one outcome variable. A situation that is ubiquitous across all areas of science. Multiple uses of univariate statistical analysis is insufficient in this settings where interdependency between the multiple random variables are of influence and interest. In this course we look at some of the key ideas associated with multivariate analysis. Topics covered include a comprehensive introduction to the linear algebra used in multivariate analysis and the standard multivariate notations including the Kronecker product, a detailed treatment of the multivariate normal distribution, the Wishart distribution, Hotelling’s T2 statistic, some key likelihood ratio tests and the ordinary, multiple and partial measures of correlation.

### M5MS09 Graphical Models

(Prof Walden) Probabilistic graphical models encode the relationships between a set of random variables, in a manner that relies on networks and graph-theoretic intuitions. Primarily, they encode conditional independence assumptions, whereby A is statistically independent of B conditional on the value of C. Just as conditional probability is one of the pillars of modern probability, conditional independence is critical in statistical modelling. It underlies model specification, and allows us to infer, elicit, and understand correlation structures between unobserved variables, given the value s of variables we already know. This course will entail a variety of material, including discrete mathematics (graph theory), statistical modelling, algorithms and computational aspects, as well as applications, involving real da t a and actual applications . We will also touch upon abstract questions, such as the difference between causality a nd correlation.

### M5MS10 Machine Learning

(Dr Calderhead) The fields of machine learning and computational statistics are readily becoming important areas of general practice, research and development within the mathematical sciences. The associated methodology is finding application in areas as varied as biology, economics and geopetroleum engineering, and its growth can be partly explained by the increase in the quantity and diversity of measurements we are able to make in the world around us. Particularly fascinating examples arise in biology, where we are now able to measure changes in molecular concentrations within specific gene regulatory networks of an organism that would have been hard to imagine only a short time ago. Machine learning techniques are vital for the distillation of useful structure from this data while avoiding model overfitting; in particular they allow us to distinguish signal from noise and characterise the most plausible scientific hypotheses given the data and prior information available to us. Many other areas and application domains, from social network analysis to algorithmic trading, benefit from machine learning methods, which are routinely used for the detection of patterns and anomalies in large quantities of data.

### M5MS11 Statistics for Extreme Events

(Dr Noven) This course introduces extreme value theory. We focus on statistical methods for extreme events and study applications in insurance and finance. The main topics are as follows:

Extreme value theory: Fluctuations of maxima; fluctuations of upper order statistics;

Statistical Methods: Probability and quantile plots; mean excess function; Gumbel’s method of exceedances; parameter estimation for the generalised extreme value distribution; estimating under maximum domain of attraction conditions; fitting excess over a threshold.

### M5MS12 Financial Econometrics

(Dr Pakkanen) Financial econometrics is an interdisciplinary area focusing on a wide range of quantitative problems arising from finance. This course gives an introduction to the field and presents some of the key statistical techniques needed to deal with both low and high frequency financial data. Main topics of the course are:

Discrete time framework: ARCH, GARCH models and their estimation;

Continuous time framework: Brownian motion, stochastic integration and stochastic differential equations, Itô’s formula, stochastic volatility, realised quadratic variat ion and its asymptotic properties, Lévy processes, testing for jumps, volatility estimation in the presence of market microstructure effects.

### M5MS13 Pricing and Hedging in Financial Markets

(Dr Pakkanen) The fundamentals of no-arbitrage theory and risk neutral valuation of contingent claims in the setting of the trinomial model will be explained. The most commonly traded contingent claims in the financial markets (vanilla and forward starting options, barrier and volatility derivatives, American options) will be described in detail and their pricing discussed in the context of trinomial models.

### M5MS14 Statistical Bioinformatics and Genetics

(Dr Evangelou) Advances in biotechnology are making routine use of DNA sequencing and microarray technology in biomedical research and clinical use a reality. Innovations in the field of Genomics are not only driving new investigations in the understanding of biology and disease but also fuelling rapid developments in computer science, statistics and engineering in order to support the massive information processing requirements.

In this course, students will be introduced into the world of Bioinformatics which has become in the last 10-15 years one of the dominant areas of research and application for modern Statistics. Students will learn about fundamental biological processes, classical models that have enabled scientists to model and understand complex biological datasets, as well as cutting edge methodology currently being used in next generation sequencing technologies.

### M5MS17 Medical Statistics

(Dr Fitz-Simon) The objective of the course is to provide a broad range of statistical techniques to analyse biomedical data that are produced by pharmaceutical companies, research units and the NHS. Besides a general introduction to linear, generalised linear models and survival analysis, the course will focus on clinical trials (study design, randomisation, sample size and power, covariates and subgroups adjustment) to examine the effect of treatments on the disease process over time and longitudinal data analysis from the perspective of clinical trials. The statistical theory and the derivation and estimation of model parameters will be illustrated as well as the application of longitudinal models on real case studies drawn from biomedical and health sciences. The analysis of the real examples will be performed using standard statistical software. At the end of the course, students will be able to plan basic clinical trials, analyze longitudinal data and interpret the results.

The course will cover the following models and topics:

- Introduction to linear/generalised linear models and survival analysis

- Introduction to clinical trials

- Treatment allocation, monitoring and effect estimation

- Introduction to longitudinal data and repeat measures

- General and generalised linear model for longitudinal data

- Random and mixed-effects models

### M5MS18 Official Statistics

(Prof Allin) Every country has some form of official statistics system, making available statistics about the economy, society and the environment. Official statistics are used not only by governments but also by businesses, the media, researchers, Civil Society and the public. The course aims to provide insight into: why official statistics are needed; how they are produced; the fundamental principles that underpin them; and how the quality of official statistics is assessed. The course will also explore the main methodologies for the production of official statistics, including sample surveys, censuses and the use of administrative (‘big’) data.

### M5MS19 Further Topics in Statistics (Big Data)

(Dr Briers - QinetQ) This course covers varying current topics in Statistics. 2015/16 will be Big Data.

The emergence of Big Data as a recognised and sought-after technological capability is due to the following factors: the general recognition that data is omnipresent, an asset from which organisations can derive business value; the efficient interconnectivity of sensors, devices, networks, services and consumers, allowing data to be transported with relative ease; the emergence of middleware processing platforms, such as Hadoop, InfoSphere Streams, Accumulo, Storm, Spark, Elastic Search, …, which in general terms, empowers the developer with an ability to efficiently create distributed fault-tolerant applications that execute statistical analytics at scale.

To promote the use of advanced statistical methods within a Big Data environment - an essential requirement if correct conclusions are to be reached - it is necessary for statisticians to utilise Big Data tools when supporting or performing statistical analysis in the modern world. The objective of this course is to train statistically minded practitioners in the use of common Big Data tools, with an emphasis on the use of advanced statistical methods for analysis. The course will focus on the application of statistical methods in the processing platforms Hadoop and Spark. Assessment will be through coursework.

### M5MS20 Sequential Monte Carlo

(Dr Kantas) Nonlinear non-Gaussian state-space models are ubiquitous in statistics, econometrics, information engineering and signal processing. Particle methods, also known as Sequential Monte Carlo (SMC) methods, provide reliable numerical approximations to the associated state inference problems, also known as the non-linear filtering. The aim of the course is to provide an introduction to these algorithms and illustrate fundamental strengths and weaknesses. The course will have an emphasis in the methodology and practical aspects of the method, but will also briefly touch on the theory behind the SMC and its relevance in improving the methodology.

### M5S14 Survival Models and Actuarial Applications with Advanced Study

(Dr Ginzberg) Survival analysis, also known as reliability analysis and event history analysis, is a branch of statistical theory concerned with modelling the random times at which specific events will occur, utilising any relevant information available. Since survival data occur temporally, survival analysis data sets will typically be incomplete, with a proportion of the observations censored since the event time has not yet occurred at the time of analysis.

The discipline has a wide variety of applications, with examples including: medicine, when measuring the time until recovery/relapse of a patient following a medical intervention; engineering, measuring the time until failure of components in a machine; economics, measuring the time until failure of businesses.

This course introduces the fundamental ideas and statistical tools for performing survival analysis, which are applicable to the range of applications indicated above. Further, we consider some more tailored statistical models which are specifically appropriate to the discipline of actuarial science, in which there is interest in measuring mortality in a population for the purpose of providing life assurance and pensions.

### M5S17 Quantitative Methods in Retail Finance

(Dr Bellotti) Introduction. Topic overview: behavioural models, profitability, fraud detection.

Survival models for credit scoring to determine time to default and include time varying information.

Roll-rate and Markov transition models to determine patterns of missed payments.

Mover-Stayer models of behaviour.

Profitability models: concepts and use of survival models.

Setting optimal credit limits.

Fraud detection.

Neural networks for fraud detection. Back-propagation and gradient descent methods.

Population drift (problem and change detection). Adaptive models.

Score cards in other application areas (case studies: physiology data, Altman Z-score).

Cost analysis of AUC and the H measure.Expected Loss, PD, EAD and LGD models (using classification tree structures).

Regulation and portfolio-level analysis. Capital requirements. One-factor Merton-type model.

Asset correlation and dynamic random effects models.

Stress testing: concepts and Monte Carlo simulation approaches

### M5S8 Time Series with Advanced Study

(Prof Walden) An introduction to the analysis of time series (series of observations, usually evolving in time) is given which gives weight to both the time domain and frequency dom ain viewpoints. Important structural features (e.g. reversibility) are discussed, and useful computational algorithms and approaches are introduced. The course is self-contained.

Discrete time stochastic processes and examples. ARMA processes. Trend removal and seasonal adjustment. General linear process. Invertibility. Directionality and reversibility in time series. Spectral representation. Aliasing. Generating functions. Estimation of mean and autocovariance sequence. The periodogram. Tapering for bias reduction. Parametric model fitting. Forecasting.

Additional material: From long-memory processes, Au toregr e ssive parametric spectrum estimation, Harmonic analysis, Mult icha nnel time series modelling and analysis.