The seminars are currently held on Fridays 2pm, at South Kensington Campus (see specific seminar pages for location information).

The seminars are organised by Dr Deniz Akyildiz.

Announcements concerning the seminar are sent via a mailing list (to which you can subscribe here).

Statistics Seminars Pre-2011 Archive

Seminars 2010 - 2011

Friday 17th June, Huxley 340

14:30 - Anthony Lee (University of Oxford)
Bayesian Sparsity Path Analysis using Hierarchical Shrinkage Priors
Variable selection techniques have become increasingly popular amongst statisticians due to an increased number of regression and classification applications involving high-dimensional data where we expect some predictors to be unimportant. In this context, shrinkage priors and maximum a posteriori (MAP) estimates are often used to identify important variables. A hierarchical framework for specifying shrinkage priors is introduced and we motivate the use of these priors in exploratory data analysis using both Bayesian inference and MAP estimation. Generating posterior distributions over a range of prior specifications is computationally challenging but naturally amenable to sequential Monte Carlo (SMC) algorithms indexed on the scale parameter of the prior. We show how SMC simulation on graphics processing units (GPUs) provides the computational power required for timely inference. An example from case-control genome-wide association studies is presented.

16:00 - Prof Stephen Walker (University of Kent)
Sampling the mixture of Dirichlet process model with a power likelihood
If a likelihood is raised to a power in (0,1) then the sequence of Bayesian posterior distributions are known to be strongly consistent. However, for the popular mixture of Dirichlet process model it is not clear how to undertake posterior inference via MCMC with the likelihood raised to a power in (0,1). By taking a power just less than one for example would guarantee a consistent sequence of posteriors which should be extremely similar to the standard likelihood. The talk will show how to do posterior inference for the mixture of Dirichlet process model with the likelihood raised to a power in (0,1)

Friday, 10 December 2010

14:30 - Prof Sofia Olhede (University College London)
Statistical Methods for Analysis of Diffusion Weighted Magnetic Resonance Imaging
High angular resolution diffusion imaging data is the observed characteristic function for the local diffusion of water molecules in tissue. This data is used to infer structural information in brain imaging.  Non-parametric scalar measures are proposed to summarize such data, and to locally characterize spatial features of the diffusion probability density function (PDF), relying on the geometry of the characteristic function.  Summary statistics are defined so that their distributions are, to first order, both independent of nuisance parameters and analytically tractable.  The dominant direction of the diffusion at a spatial location (voxel) is determined, and a new set of axes are introduced in Fourier space. Variation quantified in these axes determines the local spatial properties of the diffusion density.  Non-parametric hypothesis tests for determining whether the diffusion is unimodal, isotropic or multi-modal are proposed.  More subtle characteristics of white-matter microstructure, such as the degree of anisotropy of the PDF and symmetry compared with a variety of asymmetric PDF alternatives, may be ascertained directly in the Fourier domain without parametric assumptions on the form of the diffusion~PDF.  We simulate a set of diffusion processes and characterize their local properties using the newly introduced summaries.  We show how complex white-matter structures across multiple voxels exhibit clear ellipsoidal and asymmetric structure in simulation, and assess the performance of the statistics in clinically-acquired magnetic resonance imaging data. Joint work with Brandon Whitcher, GSK.

16:00 - Dr Christoforos Anagnostopoulos (University of Cambridge)
Handling temporal variation of unknown characteristics in streaming data analysis
Data collection technology is undergoing a revolution that is enabling streaming acquisition of real-time information in a wide variety of settings. Faced with indefinitely long, high frequency and possibly high dimensional data sequences, learning algorithms must rely on summary statistics and computationally efficient online inference without the need to store and revisit the data history. Moreover, learning must be temporally adaptive in order to remain up-to-date against unforeseen changes, smooth or abrupt, in the underlying data generation mechanism. In cases where explicit dynamic  modelling is either impossible or impractical, temporally adaptive behaviour may still be induced by controlling the responsiveness of the estimator to novel information. We discuss ways in which this can be accomplished in data-dependent manners for popular classes of online algorithms. We focus on the Robbins-Monro family of algorithms that naturally features a sequence of user-specified learning rates, and discuss available methodology for automatic self-tuning in this  context. On the basis of novel theoretical insights, as well as real-data experiments, we demonstrate the advantages and shortcomings of such approaches in handling temporal variation of unknown characteristics

Friday, 3 December 2010 

Transfer Talks

Paul Ginzberg
Processing of proper and improper quaternion signals
Complex signal processing uses the algebraic structure of complex numbers to account for relationships between the real and imaginary components of a signal and processes them jointly. Using the 4-dimensional algebraic structure of quaternions can provide similar insight when dealing with 4-component signals. Algebraic structure is directly related to patterned covariance matrices. We can test for this pattern, and it's presence (propriety) or absence (impropriety) has implications on the choice of techniques used to process the signal.

Swati Chandna
Simulating Improper Complex-Valued Processes via Circulant Embedding
The technique of circulant embedding has been used for simulat ing realizati ons from certain real-valued Gaussian stat ionar y processes. We show how this technique can be adapted to handle complex-valued  processes to generate realizations from an improper complex-valued Gaussian stationary process with a priori prescribed second-o rder st atistics. Thi s technique has two potential advantages over other competing methods for simulating time series. First, this m e t ho d is based on a discrete Fourier transform which makes it computationally attractive. Secondly, it generates exact realizations as opposed to approximate realizations. In practice, especially when dealing with applications in engineering and physical sciences, it is more likely to be provided with a set of time series data rather than a model with second-order statistics. In such situations it is useful to generate simulated time series whose statistical properties closely resemble the time series under study. We show how certain nonparametric spectral estimators from the given time series can be used together wi th the c ir culant embedding approach to generate the required realizations. We also provide results which show that this methodology provides a good nonparametric procedure for bootstrapping time series to assess the sampling variability in certain statistics of interest.

Matt Silver
Simultaneous pathway and SNP selection us ing the grouped lasso
I will present a method for the ra nking of significant biological pathways associated with a quantitative phenotype, using group lasso penalized regression in which SNPs are grouped into functionally related gene sets.  In addition, the method simultaneously ranks significant SNPs within selected pathways. An important distinguishing feature of the method is its ability to account for the presence of overlapping gene sets, arising from the typically large number of genes that are assigned to multiple pathways.  The use of pointwise stability selection combined with sparse regression across multiple pathways makes the method highly computationally efficient when compared with other methods that use permutations to rank significance. Using simulated quantitative phenotypes generated using real genotype and pathway data, we find that our method performs well when compared with other widely-used methods for pathway and SNP selection.

Elena Ehrlich
Adaptive Filtering and Time-varying Systems
For fully specified Linear Gaussian State-Space models, it is well known that the Kalman Filter (KF) provides optimal update equations. In many real problems the state-space model cannot be fully specified, and modifications to standard filtering methodology or alternative estimation methods are required to estimate the states and parameters simultaneously. The problem is significantly harder if model parameters are varying in time, particularly in unpredicted ways. Our objective is to explore the utility of Adaptive Filtering in such challenging problems. In particular, we develop a formal relationship between KF and Recursive Least Squares with Adaptive Forgetting (RLS-AF), and describe how this relationship can be exploited to identify a time-varying model which is not fully specified.

19 November 2010

14:30 - Dr Thomas Nichols (Warwick University)
 Point Process Modelling of Brain Image Data
Abstract: The s tandard approach to brain imaging data analysis is a mass-univariate one.  At each voxel (volume element) a linear model is fit, ignoring all other voxels.  While this has obvious computational advantages, it cannot capture the explicitly spatial nature of brain activations that are of interest.  We propose a hierarchal point process approach, modelling brain activation as a mixture of latent activation centers, and these activation centers are in turn modelled as off-spring from latent population centers.  This approach allows inference on the location of population centers, separately estimating the uncertainty of the population location and the uncertainty of individual activation center's location about that population center (akin to the distinction between standard error and standard
deviation).  We also show how this approach naturally encompasses coordinate-valued data, such as generated by neuroimaging meta-analyses and Multiple Sclerosis lesion data.

16:00 - Dr Enrico Petretto (Imperial College London)
Integrated systems-genetics approaches: deciphering the biological function of genes and gene networks that drive disease
Abstract: Combined analyses of gene networks and DNA sequence variation can provide new insights into the aetiology of common diseases that may not be apparent from genome wide association studies (GWASs) alone. Recent advances in rat genomics now make systems-genetics approaches possible, which we developed to identify and functionally characterise biological processes associated with disease. We used integrated genome-wide approaches across seven rat tissues to identify gene networks and the loci underlying their regulation. We defined an interferon regulatory factor 7 (IRF7)-driven inflammatory network (IDIN) enriched for viral response genes, which represents a molecular biomarker for macrophages and was regulated in multiple tissues by a locus on rat chromosome 15q25. We show that Epstein–Barr virus induced gene 2 (Ebi2, also known as Gpr183), which lies at this locus and controls B lymphocyte migration, is expressed in macrophages and regulates the IDIN. The human orthologous locus on chromosome 13q32 controlled the human equivalent of the IDIN, which was conserved in monocytes. IDIN genes were more likely to associate with susceptibility to type 1 diabetes (T1D), a macrophage-associated autoimmune disease, than randomly selected immune response genes (P = 8.85 x 10−6). The human locus controlling the IDIN was associated with the risk of T1D at single nucleotide polymorphism rs9585056 (P = 7.0 x 10−10; odds ratio 1.15), which was one of five single nucleotide polymorphisms in this region associated with EBI2 (GPR183) expression. These data implicate IRF7 network genes and their trans-acting regulatory locus in the pathogenesis of T1D and show that co-expression networks across species provide functional annotation of genes in biological processes that can be used to reveal the signal of common genetic variation of small effect that is not detected by GWAS.

Friday, 15 October 2010

14:30 - Dr Jim Griffin (University of Kent)
Title: Structuring Shrinkage
 The effectiveness (and shortcomings) of penalized maximum likelihood methods, such as the Lasso, in regression has lead to a renewed interest in the choice of prior distribution for regression coefficients in a Bayesian analysis. This talk will consider independent Normal-Gamma distributions as a prior and illustrate how they can effectively control model complexity. The prior acts as a starting point for defining priors with dependence between regression coefficients and this will also be discussed.

16:00 - Dr Nikolas Kantas (Imperial College London)
Parameter Inference for rare events using Particle Markov chain Monte Carlo
Abstract: In this talk we consider parameter inference associated to stochastic processes that are stopped at the first hitting time of some rare set. Our approach is to use a recently introduced simulation methodology, Particle Markov Chain Monte Carlo (PMCMC), which is an MCMC algorithm that uses  proposals generated using Sequential Monte Carlo (SMC) methods. However, standard SMC algorithms are not always appropriate for many stopped processes and we introduce new ideas based upon particle approximations of multi-level Feynman-Kac formulae. The methodology is applied to the coalescent and a queuing model. This is joint work with Ajay Jasra and Arnaud Doucet.

Seminars 2009 - 2010

Friday, 15 January 2010

14:30 - Alberto Cozzini (transfer talk)

Title:  Penalised Robust Mixture Modelling using t-Student Distributions and Applications

Abstract:  It is standard practice to divide financial markets among macro sectors. The criteria for the classification are usually based on the nature and fundamental characteristics of the goods exchanged. In our work we take a data-driven approach and cluster the markets on the basis of several indicators of price dynamics extracted from historical time series of the returns. Standard clustering algorithms, such as model-based algorithms based on mixture distributions, assume that all the variables are informative for clustering and that each variable is normally distributed. However, we believe that certain financial indicators may not be informative for clustering and should not be used. Moreover, most indicators follow distributions with heavier tails compared to the normal distribution. In order to address these two issues, we propose a penalized model-based clustering algorithm based on a mixture of t-distributions. The clustering algorithm is able to differentiate between variables that are important for clustering and irrelevant variables, and is robust against outliers. Statistical inference is carried out using an EM algorithm. Several alternative penalty functions will be discussed and experimental results based on both simulated and real data sets will be presented.

15:00 - Christofer Minas (transfer talk)

Title:  Differential Analysis in Gene Expression Time Courses using Distance-Based Functional Data Analysis

Abstract:  In many biological settings, gene expression data are collected over timefrom subjects or replicates of different groups, classified as such byattributes such as treatment. Typically, the responses are comprised of only a few observed values and many missing values, thus making it harder to identify or rank the genes in which the responses of the subjects differ the most between the groups. We approach this problem from a novel functional perspective, considering the responses as noisy realizations of some smooth function, creating gene-specific dissimilarity matrices from a functional principal component analysis, and using distance-based permutation testing to rank the genes in order of significance. In particular, we consider the pseudo F-test, the Multi-Response Permutation Procedure (MRPP) and the Mantel test. Equivalences between these methods are discussed, before comparing their performance with a multivariate empirical Bayes method via a simulation study. The distance-based methods are then applied to a real dataset where responses from 9 dendritic and 9 macrophage cells are observed for 22282 genes, and genes in which responses are significantly different are highlighted.

16:00 - Todd Kuffner (transfer talk)

Title:  Matching Objective Bayesian and Conditional Frequentist Coverage Probabilities in Ancillary Statistic Models.

Abstract:  Probability matching priors are a standard tool used to achieve higher-order matching of coverage probablities between Bayesian and unconditional frequentist confidence regions. We extend the analysis to ancillary statistic models, where the correct inference to be performed is a conditional one. Our results suggest that the method developed by Barndorff-Nielsen (1986) and extended by DiCiccio and Young (2009) is particularly useful in identifying priors which ensure higher-order matching. In some transformation models, the matching is exact in the sense that, if we choose a prior identified by our matching method, then the Bayesian confidence intervals have exactly the correct conditional frequentist interpretation.

16:30 - Orlando Doehring (transfer talk)

Title: Curve Selection in Multi-Group Functional Principal Component Analysis

Abstract:  We model variability of multi-group time series within a functional data analysis framework. That type of problems may arise in human motion data where each sample has three different groups of time series, one group for each spatial coordinate. Functional Principal Component Analysis (fPCA) is an unsupervised approach to data processing and dimensionality reduction where we have one group of functions or curves. The given discrete observations which may have been recorded at different time points and may not be equally spaced will be modelled functionally by projecting them onto a basis expansion. In multi-group fPCA each sample is paired with multiple groups of curves. For each sample those multiple groups are concatenated and uni-group fPCA is performed. The aim is to study variability of each of these groups and to compute a projection into the direction of maximum variability. But in multi-group fPCA the projected coordinates will be based on all the original groups. To improve interpretation we develop a sparse projection that does not depend on all original components. To reduce the size of explicitly used groups in the projection a new multi-group sparse PCA approach is proposed. Hence groups of curves are regularized related to ideas that were recently developed for grouped Lasso.

Friday, 18 December 2009

13:30 - 16:00  Astrostatistics Seminars

There is a growing interest in the use of modern statistical methods to tackle astronomical problems, driven partly by the huge astronomical data sets which are being collected.  Statisticians and astronomers at Imperial already have at least one joint project underway, and other independent discussions have also taken place.  We therefore thought it timely to organise a meeting to gauge the level of interest and bring together relevant parties, with the aim of stimulating cross-disciplinary work and new collaborations.


Statistical and data mining tools for astronomical problems
David J. Hand, Maths

Statistical challenges in cosmology and astroparticle physics
Roberto Trotta, Physics

Two examples of astrostatistics: star/galaxy separation and anomaly detection
Marc Henrion, Maths

Inferring the properties of astronomical objects: triumph and disaster
Daniel Mortlock, Physics

Friday, 11 December 2009

14:30 - Dr Sumeetpal S. Singh (Dept Engineering, Univ Cambridge)

Title:  Recursive smoothing using Particle Filters

Abstract:  Sequential Monte Carlo (SMC) methods are a widely used set of computational tools for inference in non-linear non-Gaussian state-space models. We propose a new SMC algorithm to perform smoothing recursively in time. Essentially, it is an on-line implementation of the forward filtering backward smoothing SMC algorithm proposed by Doucet, Godsill and Andrieu (2000). We show that the asymptotic variance of the path space SMC estimator increases quadratically in time whereas the increase is linear in time for our new SMC estimator. We then use the new SMC estimator to perform recursive parameter estimation using an SMC implementation of an on-line version of the Expectation-Maximization algorithm which does not suffer from the particle path degeneracy problem. This is joint work with Pierre Del Moral and Arnaud Doucet.

16:00 - Prof Adrian Bowman (Department of Statistics, University of Glasgow)

Title:  Modelling surface shape: from faces to brains

Abstract:  Stereo-photogrammetry provides high-resolution data defining the shape of three-dimensional o bjects.  One example of its application is in facial surgery where images can be used to quantify the success of an operation and to quantify residual differences from control shape.  Information can be extracted in a variety of forms.  Methods of analysing landmark shape data are well developed but landmarks alone clearly do not adequately represent the very much richer information present in each digitised face.  Facial curves with clear anatomical meaning can also be extracted.  In order to exploit the full extent of the information present in the images, standardised meshes, whose nodes c orrespond across individuals, can also be fitted. Some of the issues involved in analysing data of these types will be discussed and illustrated on surgical data.  The measurement of asymmetry and the construction of longitudinal models are of particular interest. A second form of surface data arises in the analysis of MEG data which is collected from the head surface of patients and gives information on underlying brain activity.  In this case, spatiotemporal smoothing offers a route to a flexible model for the spatial and temporal locations of stimulated brain activity.

Friday, 6 November 2009

14:30 - Prof. Robin Henderson (School of Mathematics and Statistics, University of Newcastle)

Title:  Regret regression for optimal treatment allocation, with application to warfarin anticoagulation.

Abstract:  TBA

16:00 - Prof. Chris Holmes (Oxford Centre for Gene Function)

Title: Hierarchical Bayesian mixture models.

Abstract:  We will discuss Bayesian approaches to clustering of data using mixture models. Clustering is often used to uncover hidden structure in data or to recover suspected structure. We show that by building hierarchical dependencies (via priors) on the mixing weights, mixture locations and mixture variances we can induce a rich set of generalisations of the original model which are well suited to features of many applications. The investigation is motivated by various studies in statistical genomics.

Friday, 16 October 2009

14:30 - Dr Jenny Barrett (Section of Epidemiology and Biostatistics, Leeds Institute of Molecular Medicine)

Title: Applications of the Random Forest algorithm in proteomics and genetics

Abstract:  The Random Forest (RF) algorithm was developed by Breiman as a classification tool and has been used successfully as a classifier in many contexts. We applied the RF algorithm to peaks extracted from mass spectrometry proteomic profiles as part of a "competition", designed to compare various approaches to classifying samples from breast cancer patients and controls. The algorithm performed well as a classifier and, unlike most other methods, provided good estimates of future performance from the training data set. An additional feature of the RF is a measure of the contribution of each variable to the success of the classification: variable importance measures. Using the same data set we have shown that these measures are quite stable over repeated runs of the algorithm and show good consistency among highly ranked variables between training and validation data sets. The measures have better properties when the RF is applied to a reduced data set, with low correlation between variables. The ranking of variables obtained from RF was compared with the results of simple univariate tests. We have further investigated the properties of variable importance measures in the context of genome-wide association (GWA) studies. In a simulation study of GWA data, we have investigated the power to identify loci that truly influence disease risk either singly or in combination, compared with standard approaches. Despite the good properties of RF as a classifier, in most contexts studied to date we have found that the RF does not out-perform much simpler methods of identifying important variables.

16:00 -Dr. Tim Ebbels (Biological Chemistry Section of the Division of Biomedical Sciences, Imperial College)

Title: Methods and challenges in analysis of metabolomics data

Abstract:  Metabolomics is the study of the levels of thousands of small molecular weight molecules found in cells, biofluids and tissues. It is of great interest because most biological changes (e.g. aging, disease etc.) affect metabolism and thus leave a characteristic fingerprint in the metabolic profile. Such profiles are information rich and complex and require statistical tools which are adapted specifically to their characteristics. In this talk I will review some of the challenges presented by this data such as multivariate classification, feature selection and metabolite identification. I will show examples of methods developed at Imperial to overcome these challenges including multivariate kernel density estimators, genetic algorithms and statistical correlation spectroscopy. Overall, it is clear that only a relatively small amount of the total information latent in metabolic profiles is currently being usefully extracted, thus posing important questions to be answered in the years ahead.

Seminars 2008 - 2009

12 June 09


Title: Tobit models for multivariate, spatio-temporal and compositional data 
Chris Glasbey (Biomathematics & Statistics Scotland)  
Variables whose distributions have a singularity at zero pose a challenge for statistical modelling, particularly when data are also multivariate, spatial, temporal or compositional in structure. Tobin (Econometrica, 1958) proposed a form of latent Gaussian model for such data: if a Gaussian variable is positive then it is observed, but otherwise it is censored and a zero is observed. Motivated by applications involving crop lodging, food intake, rainfall disaggregation and food composition, we demonstrate the effectiveness of Tobit models, and consider issues of estimation and validation. 
This is joint work with Dave Allcroft and Adam Butler.

22 May 09


Title: Intelligent Optimisation and Learning Using Tsallis Statistics 
Dr. George Magoulas (School of Computer Science and Information Systems, Birkbeck College)

This talk explores the use of the nonextensive q-distribution, which arises from the optimisation of the Tsallis entropy, in the context of intelligent optimisation. The continuous real parameter q that represents the degree of non-extensivity is used to generate various distributions, which have properties intermediate to that of Gaussian and Levy statistics. Adaptive mechanisms based on the q-distribution are used to enhance intelligent optimisation methods which employ diffusion and particle swarms. Application examples in high-dimensional nonlinear optimisation problems and neural networks are discussed.


Title: Particle MCMC
Prof Christophe Andrieu (Department of Mathematics, University of Bristol)
Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) methods have emerged as the two main tools to sample from high-dimensional probability distributions. Although asymptotic convergence of MCMC algorithms is ensured under weak assumptions, the performance of these latters is unreliable when the proposal distributions used to explore the space are poorly chosen and/or if highly correlated variables are updated independently. We show here how it is possible to build efficient high-dimensional proposal distributions using SMC methods. This allows us not only to improve over standard MCMC schemes but also to make Bayesian inference feasible for a large class of statistical models where this was not previously the case. We demonstrate these algorithms on various non-linear non-Gaussian state-space models, a stochastic kinetic model and Dirichlet process mixtures.

11 May 09


Title: Life Distributions in Survival Analysis and Reliability: Structure of Semiparametric Families
Ingram Olkin  (Stanford University)
This talk will take place in HUXLEY 130.
Semiparametric families are families that have both a real parameter and a parameter that is itself a distribution. A number of semiparametric parametric families suitable for lifetime data in survival or reliability are introduced: scale, power, frailty (proportional hazards), age, moment, and others. Interesting results on stochastic orderings are obtained for these families. The coincidence of two families provides a characterization of the underlying distribution. Some of the characterization results provide a rationale for the use of certain families. In this talk we provide an overview of these semiparametric families, and present several characterizations.
This work is joint with Albert W. Marshall.

1 May 09


Title: Quasi-variances
Prof. David Firth (Department of Statistics, The University of Warwick)
The notion of quasi-variances, as a device for both simplifying and enhancing the presentation of additive categorical-predictor effects in statistical models, was developed in Firth and de Menezes (Biometrika, 2004, 65-80). The approach generalizes the earlier idea of "floating absolute risk" (Easton et al., Statistics in Medicine, 1991), which has become rather controversial in epidemiology. In this talk I will outline and exemplify the method, and discuss its extension to some other contexts such as parameters that may be arbitrarily scaled and/or rotated.


Title: Some statistical aspects of Spatio-Temporal Brain Image Analysis 
Dr John Aston (Department of Statistics, The University of Warwick)
Neuroimaging is a rapidly growing area of research. The ability to measure the function of the brain “in-vivo” provides both neuroscience and psychology with tools for investigating hypotheses in a quantifiable manner that has not previously been available. However, these methods have also raised new questions for the analysis of the data. The data is rich in four dimensions and, therefore, spatio-temporal statistical techniques are required.
In this talk, we will explore methods for examining time courses in brain image analysis that borrow spatial information to improve estimation. Particular emphasis will be placed on non-parametric techniques such as wavelets and functional principal component analysis. It will be shown that by using these methods, estimation of temporal parameters from traditional parametric models can be dramatically improved in terms of mean squared error, allowing more sensitive analyses to be undertaken.

20 Mar 09


Title: On the incoherence of the area under the ROC curve, and what to do about it
Prof. David Hand (Department of Mathematics, Imperial College)
The area under the ROC curve (AUC) is a very widely used measure of performance for classification and diagnostic rules. It has the appealing property of being objective, requiring no subjective input from the user. On the other hand, the AUC has disadvantages, some of which are well known. For example, the AUC can give potentially misleading results if ROC curves cross. However, the AUC also has a much more serious deficiency, and one which appears not to have been previously recognised. This is that it is fundamentally incoherent in terms of misclassification costs. This means that using the AUC is equivalent to using different metrics to evaluate different classification rules. This property is explored in detail, and a simple valid alternative to the AUC is proposed.


Title: Why Income Comparison Is Rational
David H. Wolpert  (NASA Ames Research Center)
In many cultures a major factor affecting a person's happiness is the difference between their income and that of their neighbors, independent of their own income. This effect is strongest when the neighbor has higher income. In addition a person's lifetime happiness tends to follow a ``U" shape, with a minimum in the 40's. Previous models have separately explained some of these phenomena, typically by assuming the person has cognitive limitations, e.g., their happiness has a finite number of possible values. Here I present a model which explains all of the phenomena, and does not assume any cognitive limitations. In this model moderately greater income of your neighbor is statistical data that, if carefully analyzed, would recommend that you explore for a new income-generating strategy. This explains unhappiness that your neighbor has moderately greater income, as an emotional ``prod'' that induces you to explore, exactly as a detailed statistical analysis of the income difference would recommend. It explains the ``U" shape of happiness in a similar manner.

6 Mar 09


Transfer Talks
Adam Sykulski
Maria Vounou
Mohammed Asif Johar

20 Feb 09


Title: Statistics for Human Motion Modelling
Prof. Julian Faraway
(Department of Mathematical Sciences, University of Bath)
Human movement can be recorded using motion capture devices that track the position of markers attached to the body. The data objects produced consist of curves, trajectories, orientations and shapes changing in time. We describe how to model such data and form composite models for human motion. We demonstrate the use of these methods with applications in Biomechanics for virtual manufacturing and in Orthodontics for cleft lip and palate.


Title: Genome Wide Assocition analysis of WTCCC coronary artery disease (CAD) phenotype 
Dimitris Kalaitzopoulos(InforSense)
With the throughput that is possible from new genotyping platforms, the promise of delivering novel biomarkers is now within reach. As genome-wide association (GWA) studies become more commonplace,analytic software must be highly flexible to enable ad hoc integrationof many different types of data and algorithms and compare outputs from different approaches. InforSense GenSense enables the analysis of data from the latest generation of genotyping platforms. It has been specifically designed to assist researchers to understand complex analyses, quickly identify interesting SNPs and produce reports with graphical summaries and interactive visualizations of large datasets.
In collaboration with Erasmus MC, InforSense replicated the Wellcome Trust Case Control Consortium (WTCCC) study for the coronary artery disease (CAD) phenotype using GenSense. This replication study was performed in order to demonstrate the robustness and accuracy of GenSense in analysing GWA datasets. The genotype called data were imported to GenSense and the WTCCC quality control pipeline was applied. Then single-SNP tests of association were performed on the filtered data. GenSense successfully replicated the associations on chromosome 9 found by WTCCC. Finally, the flexibility and extensibility of the platform in response to the demanding needs of GWA studies will be demonstrated.

30 Ja n 09


Title: 'What role should formal risk-benefit decision-making play in the regulation of medicines?'
Prof. Deborah Ashby (Department of Epidemiology and Public Health, Imperial College London)
The regulation of medicine requires evidence of the efficacy and safety of medicines, and methods are well-developed to deal with the latter and to a lesser extent the former. However, until recently, assessment of risk- benefit especially in relation to alternatives has been entirely informal. There is now growing interest in the possibilities of more formal approaches to risk-benefit decision-making. In this talk, we review the basis of drug regulation, the statistical basis for decision-making under uncertainty, current initiatives in the area, and discuss possible approaches that could enhance the quality of regulatory decision-making.

18 Dec 08


Title: A Bayesian View of Some Causal Inference Procedures 
David A. Stephens (Department of Mathematics and Statistics, McGill University)
Causal inference methods have been demonstrated to improve estimation of treatment effects in studies where confounding may be present and the confounding covariate vector is high dimensional. In this talk, I will briefly review some of the most common causal inference procedures, and then outline Bayesian versions that are implementable, albeit at the cost of making model assumptions which are not necessary in the frequentist version. I will demonstrate the similarities and differences between frequentist and Bayesian causal procedures.
In the talk, the main application of the work is in a longitudinal dose-response study into the treatment of the vision deficit condition Amblyopia. A common treatment of amblyopia is occlusion (patching) but until recently the efficacy of the treatment remained unquantified. I will report on two studies - one observational, one experimental - that have attempted to make this quantification. Even in the randomized study, the effect of occlusion dose is potentially confounded as the amount of dose received is controlled by the subject, and thus causal methods must be used.

12 Dec 08


Transfer Talks
Maurice Berk
Yiannis Phinikettos
Marc Henrion
Gordon Ross

5 Dec 08


Title: Simple models for longitudinal data with informative observation
Daniel Farewell (Cardiff)
Models for longitudinal measurements truncated by possibly informative dropout have tended to be either mathematically complex or computationally demanding. I will review a recently proposed alternative, using simple ideas from event-history analysis (where censoring is commonplace) to yield moment-based estimators for balanced, continuous longitudinal data. I shall then discuss some work in progress: extending these ideas to more general longitudinal data, while maintaining simplicity of understanding and implementation.

21 Nov 08


Transfer Talks
Edward Cohen
Chen Chen
Brian McWilliams

7 Nov 08


Title: The Dantzig selector in Cox's proportional hazards model
Piotr Fryzlewicz (Department of Mathematics, University of Bristol)
The Dantzig selector is a recent approach to variable selection in sparse linear models where the number of covariates possibly exceeds the number of observations. The main idea is to choose the set of variables with the smallest l_1 norm, but subject to the likelihood being, in a certain sense, near its maximum at that point of the parameter space. The Dantzig selector is rapidly computable via linear programming, acts as a variable selector (i.e. sets some of the variables exactly to zero) due to the use of the l_1 norm, and enjoys the oracle property, i.e. estimates the parameter vector almost as accurately as if the true model were known. In this work, we formulate an extension of the Dantzig selector to Cox's proportional hazards model for right-censored survival data. We propose a fast algorithm for computing the estimator, show its theoretical consistency, and demostrate its practical performance.


Title: Dissatisfied with schools of inference (Bayesian, likelihood, frequentist etc.)? How to build your own inference e ngine.
John Nelder (Department of Mathematics, Imperial College) 
The construction of an inference engine for the model class Double Hierarchical Generalised Linear Models will be be developed. It contains elements from existing schools plus the new idea of extended likelihood. The framework may be useful elsewhere.

17 Oct 08


Title: Stochastic Boosting 
Ajay Jasra (Imperial) In this talk, I discuss a class of stochastic boosting algorithms, which builds upon the work of Holmes & Pi ntore (2006). In particular, some problems with the importance sampling method are highlighted; it is shown how to perform statistical inference in a computationally efficient manner. Sequential Monte Carlo (SMC) methods are used to illustrate that the stochastic boosting methods can provide better predictions, for a higher computational cost, than the corresponding boosting algorithm. A theoretical result is also given, which expresses an upper-bound of the posterior-predictive test error, in terms of that of boosting. The result shows that the averaged predictions used, are relatively stable with respect to boosting, when the latter provides the single best prediction. We also investigate the method on a real case study from machine learning and in a regression context, showing that it can be a useful tool for data exploration.


Title: Robust Approach to Graphical Modelling for Brain Connectivity via Partial Coherence
Tarek Medkour (Imperial)
In modern neuroscience, the analysis of brain connectivity has seen a huge current interest due to the fact that the computational properties of the brain are considered a direct consequence of its circuitry. As a measure of cortical activity, we use electroencephalograms (EEG) which represent the potential of the electrica l activity of the brain. Our data consists of a set of 8 schizophrenic patients and a second of 23 controls. In each set we recorded the activity at 10 locations on the human scalp. We study the interactivity between these different areas of the brain and aim at establishing a model that enables practitioners to distinguish between schizophrenic sufferers and non-sufferers. For the quantitative analysis, we use multivariate time series techniques, specifically partial coherence graphs, to determine the inter-connectivity patterns of the brain. This approach identifies the graphical models based on performing a multiple hypothesis testing procedure on the partial coherences. The spectral matrices required for the estimation of the partial coherences are evaluated using multitaper spectral estimation methods. Evaluating the partial coherences requires the inversion of the spectral matrices, however, our analysis of the stability of these matrices, through the study of the condition number, shows that they are in fact ill-conditio ned, in other words their inversion is not reliable for statistical analysis. Hence, we stabilise the matrices using matrix diagonal upweighting (a method analogous to ridge regression), which consists of adding a specific proportion of white noise to the time series. We consider a graphical approach based on the analysis of the partial mutual information to find the right proportion of noise to add. Determining the existence of an interaction between two locations of the brain is achieved by testing for a zero valued partial coherence. Being defined in the frequency domain one needs to consider the whole spectrum. Thus, we apply the maximin multiple hypothesis testing procedure with different options to establish the most reliable approach. We compare the use of a dependent against an independent sample of partial coherences. We also look at the effect of different spectral matrices averaging procedures, as well as, the effect of the multitapering parameters and some bias reducing estimation solutions.

Seminars 2007 - 2008

Friday 13/06/2008

Flexible Covariance Estimation in Gaussian Graphical models
Bala Rajaratnam (Department of Statistics, Stanford)
Covariance estimation is known to be a challenging problem, especially for high-dimensional data. In this context, graphical models can act as a tool for regularization and have proven to be excellent tools for the analysis of high dimensional data. Graphical models are statistical models where dependencies between variables are represented by means of a graph. Both frequentist and Bayesian inferential procedures for graphical models have recently received much attention in the statistics literature. The hyper-inverse Wishart distribution is a commonly used prior for Bayesian inference on covariance matrices in Gaussian Graphical models. This prior has the distinct advantage that it is a conjugate prior for this model but it suffers from lack of flexibility in high dimensional problems due to its single shape parameter.

In this talk, for posterior inference on covariance matrices in decomposable Gaussian graphical models, we use a flexible class of conjugate prior distributions defined on the cone of positive-definite matrices with fixed zeros according to a graph G. This class includes the hyper inverse Wishart distribution and allows for up to k+1 shape parameters where k denotes the number of cliques in the graph. We first add to this class of priors, a reference prior, which can be viewed as an improper member of this class. We then derive the general form of the Bayes estimators under traditional loss functions adapted to graphical models and exploit the conjugacy relationship in these models to express these estimators in closed form. The closed form solutions allow us to avoid heavy computational costs that are usually incurred in these high-dimensional problems. We also investigate decision-theoretic properties of the standard frequentist estimator, which is the maximum likelihood estimator, in these problems. Furthermore, we illustrate the performance of our estimators through numerical examples and comparisons with previous work where we explore frequentist risk properties and the efficacy of graphs in the estimation of high-dimensional covariance structures. We demonstrate that our estimators yield substantial risk reductions over the maximum likelihood estimator in the graphical model.

Low effective dimension in models or data: a key to high dimensional inference?
Peter Bickel (Department of Statistics, Berkeley)
Theoretical Analysis seems to suggest that standard problems such as estimating a function of high dimensional variables with noisy data (regression or classification) should be impossible without detailed knowledge or absurdly large amounts of data. Yet, algorithms to perform classification of high dimensional images or other high dimensional objects are remarkably successful. The generally held explanation is the presence of sparsity/ low dimensional structure. I'll discuss and with examples why this may be right.

Friday 23/05/2008

Statistics and politics: incendiary combination or democractic necessity 
John Pullinger 
John Pullinger, Librarian and Director General of Information Services at the House of Commons and member of the Council of the Royal Statistical Society, will explore how the word statistics came into use as a branch of politics. His talk will outline the history of the interaction between politics and statsitics. This will include the roles played by a number of Prime Ministers and others such as Florence Nightingale. The talk will go on to tell some stories from John's own experienc e as an Executive Director at the Office for National Statistics until 2004 before discussing the dangers and rewards of putting statistics at the heart of the relationship between the citizen and the state.

Estimating parental ancestry from genotype data
Clive Hoggart (Division of Epidemiology, Imperial)
We describe how the proportion of an individual's genome from each of the continental populations can be estimated from the analysis of genotype data.
The method can be used to predict the bio-geographical ancestry and physical appearance of a culprit from DNA recovered from the scene of a crime. The method exploits ancestry informative markers (AIMs), genetic loci which exhibit allele frequency differences between populations. We use a panel of AIMs informative for the four main continental populations, sub-Saharan African, European, East Asian and Native American. With such a dense panel of markers we can estimate theancestry proportions of the two parental gametes separately. Where an individual has ancestry from more than one continental po pulation we can make inference on the number of generations over which admixture has occurred by modelling the stochastic variation in ancestry along the chromosome. We also make inference on th e number of populations contributing to admixture on each gamete by comparison of marginal likelihoods. We demonstrate the method on individuals whose family background is known. Throughout a fully Bayesian perspective is taken. The analyses use the program ADMIXMAP which was originally developed for genetic association studies and admixture mapping.
This is joint work with Paul M. McKeigue (University of Edinburgh).

Friday 02/05/2008

Underground explosion or earthquake: Multivariate discrimination has the answer 
Dale N Anderson (Pacific Northwest National Laboratory) 
Seismic monitoring for underground nuclear explosions answers three questions for all global seismic activity: Where is the seismic event located? W hat is the event source type (event identification)? If the event is an explosion, what is the yield? The answers to these questions involves processing seismometer waveforms with propagation paths predominately in the mantle. Four discriminants commonly used to identify teleseismic events are depth from travel time, presence of long-period surface energy (mb vs. MS), depth from reflective phases, and polarity of first motion. The seismic theory for these discriminants is well established in the literature. However the physical basis of each has not been formally integrated into probability models to account for statistical error and provide discriminant calculations appropriate, in general, for multidimensional event identification. This article develops a mathematical statistics formulation of these discriminants and offers a novel approach to multidimensional discrimination that is readily extensible to other discriminants. For each discriminant a probability model is formulated under a general null hypothesis of H0: Explosion Characteristics. The veracity of the hypothesized model is measured with a p-value calculation that can be filtered to be approximately normally distributed and is in the range [0, 1]. The hypothesis test formulation ensures that seismic phenomenology is tied to the interpretation of the p-value. These p-values are then embedded into a multidiscriminant algorithm that is developed from regularized discrimination methods proposed by DiPillo (1976), Smidt and McDonald (1976), and Friedman (1989). Performance of the methods is demonstrated with 102 teleseismic events with magnitudes (mb) ranging from 5 to 6.5.

Learning Curves: Lessons from Statistical Machine Translation 
Professor Nello Cristianini (Departments of: Engin eerin g Mathematics and Computer Science, University of Bristol) 
We will present an overview of Statistical Machine Translation methods, and a discussion of learning curves in this context.

Friday 14/03/2008

Transfer Talks
James Bentham
Fanyin Zhou
Inmaculada Vidana-Marquez

Friday 7/03/2008

Imperial Joint Statistics Seminar - Short Talks
(follow the link for the full program)

Friday 22/02/2008

Transfer Talks
Hung Lu
Theodoros Tsagaris
Zi Yang

Friday 15/02/2008

Classifier ensembles for changing environments 
Dr. Ludmila I. Kuncheva (School of Informatics, University of Wales, Bangor) 
Classification problems coming from real life are hardly ever static. Class description changes, probability distributions float, new classes appear and old classes disappear, novel technologies enable new complex and more indicative features to be measured. The talk will outline the existing work in this direction. Individual classifier models as well as classifier ensembles will be presented. We will touch upon some of the critical issues of the area including change detection techniques, forgetting strategies and the notorious lack of benchmark data.

Meta analysis on the normal calibration scale
Dr Elena Kulinskaya (Statistical Advisory Service, Imperial) 
This talk is about an approach to meta analysis and to statistical evidence developed jointly with Stephan Morgenthaler and Robert Staudte, and now written up in our book 'Meta Analysis: a guide to calibrating and combining statistical evidence', Wiley, February 2008.
The traditional ways of measuring evidence, in particular with p-values, are neither intuitive nor useful when it comes to making comparisons between experimental results, or when combining them. We measure evidence for an alternative hypothesis, not evidence against a null. To do this, we have in a sense adopted standardized scores for the calibration scale. Evidence for us is simply a transformation of a test statistic S to another one (called evidence T=T(S)) whose distribution is close to normal with variance 1, and whose mean grows from 0 with the parameter as it moves away from the null. Variance stabi lization is used to arrive on this scale. For meta analysis the results from different studies are transformed to a common calibration scale, where it is simpler to combine and interpret them.
I'll provide an introduction and an overview, including some open problems.

Friday 25/01/2008

Dimension Reduction Paradigms for Regression
Professor R. Dennis Cook (School of Statistics, University of Minnesota) 
Dimension reduction for regression, represented primarily by principal components, is ubiquitous in the applied sciences. This is an old idea that has moved to a position of prominence in recent years because technological advances now allow scientists to routinely formulate regressions in which the number p of predictors is considerably larger than in the past. Although "large" p regressions are perhaps mainly responsible for renewed interest, dimension reduction methodology can be useful regardless of the size of p. Starting with a little history and a definition of "sufficient reductions", we will consider a variety of models for dimension reduction in regression. The models start from one in which maximum likelihood estimation produces principal components, step along a few incremental expansions, and end with forms that have the potential to improve on some standard methodology. This development provides remedies for two concerns that have dogged principal components in regression: principal components are typically computed from the predictors alone and then do not make apparent use of the response, and they are not equivariant under full rank linear transformation of the predictors.

Friday 30/11/2007

Modelling non-stationary extreme values with application to surface-level ozone
Professor Jonathan Tawn (Department of Mathematics and Statistics, Lancaster University) 
Statistical methods for modelling extremes of stationary sequences have received much attention. The most common method is to model the rate and size of exceedances of some high constant threshold; the size of exceedances is modelled using a generalised Pareto distribution (GPD). Frequently, data sets display non-stationarity; this is especially common in environmental applications. The ozone data set presented here is an example of such a data set. Surface-level ozone levels display complex seasonal patterns and trends due to the mechanisms involved in ozone formation. The standard methods of modelling the extremes of a non-stationary process focus on retaining a constant threshold but using covariate models in the rate and GPD parameters. In this talk an alternative approach will be proposed that uses pre-processing methods to model the non-stationarity in the body of the process and then uses standard methods to model the extremes of the pre-processed data. I will try to justify a claim that the pre-processing method gives a model that better incorporates the underlying mechanisms that generate the process, produces a simpler and more efficient fit and allows easier computation.

Determining parameter redundancy using symbolic algebra
Professor Byron Morgan (Institute of Mathematics, Statistics and Actuarial Science, University of Kent) 
A model is parameter redundant if it can be rewritten in terms of a smaller set of parameters. Computer packages for symbolic algebra provide a modern approach to the problem of parameter redundancy, allowing one to determine how many parameters may be estimated using classical inference, and also to identify which combinations of the original parameters are involved. Full rank models, in which all parameters can in principle be estimated, may be classified as essentially or conditio nall y full rank, by means of an extended PLUR matrix decomposition. The parameter redundancy status of models for a given structure and size ma y be extended to models of the same structure and any size by means of expansion theorems, and difficulties with the memory limitations of symbolic algebra packages may be overcome by means of the imaginative use of exhaustive summaries. The link with weak identifiability in Bayesian inference is also mentioned. The approach may be applied in many different areas, and in this talk it is illustrated by a range of examples involving mark recapture recovery data on a number of wild animal species. This talk describes joint research with Ted Catchpole and Diana Cole, who has been supported by the EPSRC.

Friday 16/11/2007

Transfer Talks
Richard Russell
Christoforos Anagnostopoulos
Irfan Sheikh

Friday 09/11/2007

Advances in Consistent Estimation for Tracking 
Dr Steven Reece (Pattern Analysis Research Group, Dept. Engineering Science, Oxford University) 
The term "consistent estimation" has been used unconventionally by Julier and Uhlmann to describe a highly flexible approach to estimation which uses conservative covariance matrices to capture impurities and missing components in tracker models. The approach uses the Kalman filter as the basic inference engine but is supplemented by techniques such as Covariance Intersection (CI) and Covariance Union (CU). The approach offers solutions to rumour propagation when performing inference in cyclic graphs, inference with incomplete covariance models and multiple hypothesis tracking with unknown assignment probabilities. This talk will review consistent estimation methods and then present recent extensions including Bounded Covariance Inflation and Generalised Covariance Union. The new methods offer more flexible, information efficient generalisations of CI and CU and go some way towards defining a unified theory for consistent estimation.
The approach will be demonstrated on inference problems in cyclic graphs, Decentralised Simultaneous Localisation and Mapping (DSLAM) problems from the robotics domain and decentralised urban fire monitoring.

Modelling Multiple Time Series via Common Factors 
Professor Qiwei Yao (Department of Statistics, LSE) 
We propose a new method for estimating common factors of multiple time series. One distinctive feature of the new approach is that it is applicable to nonstationary time series. The unobservable (nonstationary) factors are identified via expanding the orthoganal complement of the factor loading space step by step; therefore solving a high-dimensional optimization problem by many low-dimensional sub-problems. Asymptotic properties of the estimation were investigated. The proposed methodol ogy was illustrated with both simulated and real data sets.

Friday 19/10/2007

Analysis of Stationary, Quasi-Seasonal Processes 
Dr Emma McCoy  (Imperial College London) 
Time series which exhibit seasonality with a period that is not fixed, but varies through time are common in a wide range of physical phenomena. This quasi-seasonal dependence can be modelled using extensions of the standard ARIMA model that incorporate seasonal persistence. For such data, standard frequency domain based procedures produce biased estimators, even for large sample sizes. This bias can be addressed by considering an alternative frequency-domain likelihood approximation, the form of the likelihood means that its asymptotic and large sample properties and associated maximum likelihood estimators for both the seasonality and degree of persistence can be developed. After outlining the procedure and its properties I will finish by comparing it to more standard procedures in the analysis of simulated data and data from econometric and meteorological applications.

Computing the maximum likelihood estimator of a multidimensional log-concave density 
Dr Richard Samworth (Statistical Laboratory, Cambridge) 
Abstract: We show that if $X_1,...,X_n$ are a random sample from a log-concave density $f$ in $\mathbb{R}^d$, then with probability one there exists a unique maximum likelihood estimator $\hat{f}_n$ of $f$. The use of this estimator is attractive because, unlike kernel density estimation, the estimator is fully automatic, with no smoothing parameters to choose. The existence proof is non-constructive, however, and in practice we require an iterative algorithm that converges to the estimator. By reformu lating the problem as one of non-differentiable convex optimisation, we are able to exhibit such an algorithm. We are also able to extend the methodology to fit finite mixtures of log-concave den sities, yielding a promising technique for clustering and/or classification. The talk will be illustrated with pictures from the R package LogConcDEAD. This is joint work with Madeleine Cule (Cambridge) and Michael Stewart (University of Sydney).

Diversity Policy of the Statistics Seminar

Department of Mathematics, Imperial College London 

Imperial College London, its Department of Mathematics, and its Statistics Section are committed to fostering a welcoming, open, and diverse research community. We recognize that there are groups which remain under-represented in the discipline, particularly among senior academics and other professionals. One of the ways we can try to address this is to take active steps to promote equality, diversity and inclusion among the Statistics Seminar speakers. This this end: 

  1. We will seek a diverse set of speakers that reflects the diversity of the UK’s statistics community in terms of gender as well as ethnicity and race, for example.
  2. We will aim to invite speakers from around the UK, and, insofar as possible, around the rest of Europe and the world (e.g. international visitors coming to the UK for conferences, workshops, or longer-term visits such as sabbaticals).
  3. We will monitor our speaker list and make it public on a dedicated website and also announce all seminars as widely as possible (the organisers have a subscribable mailing list that is used in addition to posted announcement and the website).

When soliciting names of potential speakers the organisers emphasize our aim to present a slate of speakers that is diverse in terms of gender, ethnicity/race, and geography. We will also take advantage of resources such as R Ladies, the UK Women in Mathematics LinkedIn Group, the European Women in Mathematics Members List, and Wikipedia’s Women in Statistics Page.