Publications
150 results found
Newcombe PJ, Ali HR, Blows FM, et al., 2017, Weibull regression with Bayesian variable selection to identify prognostic tumour markers of breast cancer survival, STATISTICAL METHODS IN MEDICAL RESEARCH, Vol: 26, Pages: 414-436, ISSN: 0962-2802
- Author Web Link
- Cite
- Citations: 16
Papathomas M, Richardson S, 2016, Exploring dependence between categorical variables: Benefits and limitations of using variable selection within Bayesian clustering in relation to log-linear modelling with interaction terms, JOURNAL OF STATISTICAL PLANNING AND INFERENCE, Vol: 173, Pages: 47-63, ISSN: 0378-3758
- Author Web Link
- Cite
- Citations: 3
Greene D, NIHR BioResource, Richardson S, et al., 2016, Phenotype Similarity Regression for Identifying the Genetic Determinants of Rare Diseases, American Journal of Human Genetics, Vol: 98, Pages: 490-499, ISSN: 1537-6605
Rare genetic disorders, which can now be studied systematically with affordable genome sequencing, are often caused by high-penetrance rare variants. Such disorders are often heterogeneous and characterized by abnormalities spanning multiple organ systems ascertained with variable clinical precision. Existing methods for identifying genes with variants responsible for rare diseases summarize phenotypes with unstructured binary or quantitative variables. The Human Phenotype Ontology (HPO) allows composite phenotypes to be represented systematically but association methods accounting for the ontological relationship between HPO terms do not exist. We present a Bayesian method to model the association between an HPO-coded patient phenotype and genotype. Our method estimates the probability of an association together with an HPO-coded phenotype characteristic of the disease. We thus formalize a clinical approach to phenotyping that is lacking in standard regression techniques for rare disease research. We demonstrate the power of our method by uncovering a number of true associations in a large collection of genome-sequenced and HPO-coded cases with rare diseases.
Mattei F, Liverani S, Guida F, et al., 2016, Multidimensional analysis of the effect of occupational exposure to organic solvents on lung cancer risk: the ICARE study, Occupational and Environmental Medicine, Vol: 73, Pages: 368-377, ISSN: 1470-7926
Background The association between lung cancer and occupational exposure to organic solvents is discussed. Since different solvents are often used simultaneously, it is difficult to assess the role of individual substances.Objectives The present study is focused on an in-depth investigation of the potential association between lung cancer risk and occupational exposure to a large group of organic solvents, taking into account the well-known risk factors for lung cancer, tobacco smoking and occupational exposure to asbestos.Methods We analysed data from the Investigation of occupational and environmental causes of respiratory cancers (ICARE) study, a large French population-based case–control study, set up between 2001 and 2007. A total of 2276 male cases and 2780 male controls were interviewed, and long-life occupational history was collected. In order to overcome the analytical difficulties created by multiple correlated exposures, we carried out a novel type of analysis based on Bayesian profile regression.Results After analysis with conventional logistic regression methods, none of the 11 solvents examined were associated with lung cancer risk. Through a profile regression approach, we did not observe any significant association between solvent exposure and lung cancer. However, we identified clusters at high risk that are related to occupations known to be at risk of developing lung cancer, such as painters.Conclusions Organic solvents do not appear to be substantial contributors to the occupational risk of lung cancer for the occupations known to be at risk.
Lewin A, Saadi H, Peters JE, et al., 2015, MT-HESS: an efficient Bayesian approach for simultaneous association detection in OMICS datasets, with application to eQTL mapping in multiple tissues, Bioinformatics, Vol: 32, Pages: 523-532, ISSN: 1367-4803
Motivation: Analysing the joint association between a large set of responses and predictors is a fundamental statistical task in integrative genomics, exemplified by numerous expression Quantitative Trait Loci (eQTL) studies. Of particular interest are the so-called ‘hotspots’, important genetic variants that regulate the expression of many genes. Recently, attention has focussed on whether eQTLs are common to several tissues, cell-types or, more generally, conditions or whether they are specific to a particular condition.Results: We have implemented MT-HESS, a Bayesian hierarchical model that analyses the association between a large set of predictors, e.g. SNPs, and many responses, e.g. gene expression, in multiple tissues, cells or conditions. Our Bayesian sparse regression algorithm goes beyond ‘one-at-a-time’ association tests between SNPs and responses and uses a fully multivariate model search across all linear combinations of SNPs, coupled with a model of the correlation between condition/tissue-specific responses. In addition, we use a hierarchical structure to leverage shared information across different genes, thus improving the detection of hotspots. We show the increase of power resulting from our new approach in an extensive simulation study. Our analysis of two case studies highlights new hotspots that would remain undetected by standard approaches and shows how greater prediction power can be achieved when several tissues are jointly considered.
Hastie DI, Liverani S, Richardson S, 2015, Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations, STATISTICS AND COMPUTING, Vol: 25, Pages: 1023-1037, ISSN: 0960-3174
- Author Web Link
- Cite
- Citations: 26
Geneletti S, O'Keeffe AG, Sharples LD, et al., 2015, Bayesian regression discontinuity designs: incorporating clinical knowledge in the causal analysis of primary care data, STATISTICS IN MEDICINE, Vol: 34, Pages: 2334-2352, ISSN: 0277-6715
- Author Web Link
- Cite
- Citations: 19
Vallejos CA, Marioni JC, Richardson S, 2015, BASiCS: Bayesian Analysis of Single-Cell Sequencing Data, PLOS Computational Biology, Vol: 11, ISSN: 1553-734X
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell's lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.
Wallace C, Cutler AJ, Pontikos N, et al., 2015, Dissection of a Complex Disease Susceptibility Region Using a Bayesian Stochastic Search Approach to Fine Mapping, PLOS Genetics, Vol: 11, ISSN: 1553-7390
Identification of candidate causal variants in regions associated with risk of common diseases is complicated by linkage disequilibrium (LD) and multiple association signals. Nonetheless, accurate maps of these variants are needed, both to fully exploit detailed cell specific chromatin annotation data to highlight disease causal mechanisms and cells, and for design of the functional studies that will ultimately be required to confirm causal mechanisms. We adapted a Bayesian evolutionary stochastic search algorithm to the fine mapping problem, and demonstrated its improved performance over conventional stepwise and regularised regression through simulation studies. We then applied it to fine map the established multiple sclerosis (MS) and type 1 diabetes (T1D) associations in the IL-2RA (CD25) gene region. For T1D, both stepwise and stochastic search approaches identified four T1D association signals, with the major effect tagged by the single nucleotide polymorphism, rs12722496. In contrast, for MS, the stochastic search found two distinct competing models: a single candidate causal variant, tagged by rs2104286 and reported previously using stepwise analysis; and a more complex model with two association signals, one of which was tagged by the major T1D associated rs12722496 and the other by rs56382813. There is low to moderate LD between rs2104286 and both rs12722496 and rs56382813 (r2 ≃ 0:3) and our two SNP model could not be recovered through a forward stepwise search after conditioning on rs2104286. Both signals in the two variant model for MS affect CD25 expression on distinct subpopulations of CD4+ T cells, which are key cells in the autoimmune process. The results support a shared causal variant for T1D and MS. Our study illustrates the benefit of using a purposely designed model search strategy for fine mapping and the advantage of combining disease and protein expression data.
Liverani S, Hastie DI, Azizi L, et al., 2015, PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes, JOURNAL OF STATISTICAL SOFTWARE, Vol: 64, Pages: 1-30, ISSN: 1548-7660
- Author Web Link
- Cite
- Citations: 60
Molitor J, Brown IJ, Chan Q, et al., 2014, Blood Pressure Differences Associated With Optimal Macronutrient Intake Trial for Heart Health (OMNIHEART)-Like Diet Compared With a Typical American Diet, HYPERTENSION, Vol: 64, Pages: 1198-U86, ISSN: 0194-911X
- Author Web Link
- Cite
- Citations: 19
Chen L, Kostadima M, Martens JHA, et al., 2014, Transcriptional diversity during lineage commitment of human blood progenitors, SCIENCE, Vol: 345, Pages: 1580-+, ISSN: 0036-8075
- Author Web Link
- Cite
- Citations: 175
Pettit J-B, Tomer R, Achim K, et al., 2014, Identifying Cell Types from Spatially Referenced Single-Cell Expression Datasets, PLOS COMPUTATIONAL BIOLOGY, Vol: 10
- Author Web Link
- Cite
- Citations: 18
Li G, Haining R, Richardson S, et al., 2014, Space-time variability in burglary risk: A Bayesian spatio-temporal modelling approach, SPATIAL STATISTICS, Vol: 9, Pages: 180-191, ISSN: 2211-6753
- Author Web Link
- Cite
- Citations: 65
Kirk P, Witkover A, Bangham CRM, et al., 2013, Balancing the Robustness and Predictive Performance of Biomarkers, JOURNAL OF COMPUTATIONAL BIOLOGY, Vol: 20, Pages: 979-989, ISSN: 1066-5277
- Author Web Link
- Cite
- Citations: 8
Chadeau-Hyam M, Tubert-Bitter P, Guihenneuc-Jouyaux C, et al., 2013, Dynamics of the Risk of Smoking-Induced Lung Cancer: A Compartmental Hidden Markov Model for Longitudinal Analysis, Epidemiology, Vol: n/a, ISSN: 1044-3983
Hastie DI, Liverani S, Azizi L, et al., 2013, A semi-parametric approach to estimate risk functions associated with multi-dimensional exposure profiles: application to smoking and lung cancer, BMC MEDICAL RESEARCH METHODOLOGY, Vol: 13, ISSN: 1471-2288
- Author Web Link
- Cite
- Citations: 20
Bottolo L, Chadeau-Hyam M, Hastie DI, et al., 2013, GUESS-ing polygenic associations with multiple phenotypes using a GPU-based evolutionary stochastic search algorithm, PLoS Genetics, Vol: 9, ISSN: 1553-7390
Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n = 3,175), when compared with the largest published meta-GWAS (n>100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This pr
Geneletti S, Best N, Toledano MB, et al., 2013, Uncovering selection bias in case-control studies using Bayesian post-stratification, STATISTICS IN MEDICINE, Vol: 32, Pages: 2555-2570, ISSN: 0277-6715
- Author Web Link
- Cite
- Citations: 9
Hansell AL, Blangiardo M, Fortunato L, et al., 2013, Aircraft noise and cardiovascular disease near Heathrow airport in London: small area study., BMJ, Vol: 347, ISSN: 0959-535X
To investigate the association of aircraft noise with risk of stroke, coronary heart disease, and cardiovascular disease in the general population.
Li G, Haining R, Richardson S, et al., 2013, Evaluating the No Cold Calling zones in Peterborough, England: application of a novel statistical method for evaluating neighbourhood policing policies, Environment and Planning A
Some police Forces in the UK institute “No Cold Calling” (NCC) zones to reduce cold callings (unsolicited visits to sell products or services), which are often associated with rogue trading and distraction burglary. This paper evaluates the NCC targeted areas chosen in 2005-6 in Peterborough and reports whether they experienced a measurable impact on their burglary rates in the period up to 2008. Time series data for burglary at the Census Output Area level are analyzed using a Bayesian hierarchical modelling approach to address issues of data sparsity and lack of randomized allocation of areas to treatment groups that are often encountered in small area quantitative policy evaluation. To ensure internal validity, we employ the interrupted time series quasi-experimental design embedded within a matched case-control framework. Results reveal a positive impact of NCC zones on reducing burglary rates in the targeted areas compared to the control areas.
Astle W, De Iorio M, Richardson S, et al., 2012, A Bayesian Model of NMR Spectra for the Deconvolution and Quantification of Metabolites in Complex Biological Mixtures, JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, Vol: 107, Pages: 1259-1271, ISSN: 0162-1459
- Author Web Link
- Cite
- Citations: 31
Papathomas M, Molitor J, Hoggart C, et al., 2012, Exploring Data From Genetic Association Studies Using Bayesian Variable Selection and the Dirichlet Process: Application to Searching for Gene Gene Patterns, GENETIC EPIDEMIOLOGY, Vol: 36, Pages: 663-674, ISSN: 0741-0395
- Author Web Link
- Cite
- Citations: 30
Petit C, Blangiardo M, Richardson S, et al., 2012, Association of Environmental Insecticide Exposure and Fetal Growth With a Bayesian Model Including Multiple Exposure Sources, AMERICAN JOURNAL OF EPIDEMIOLOGY, Vol: 175, Pages: 1182-1190, ISSN: 0002-9262
- Author Web Link
- Cite
- Citations: 26
Ancelet S, Abellan JJ, Vilas VJDR, et al., 2012, Bayesian shared spatial-component models to combine and borrow strength across sparse disease surveillance sources, BIOMETRICAL JOURNAL, Vol: 54, Pages: 385-404, ISSN: 0323-3847
- Author Web Link
- Cite
- Citations: 12
Li G, Best N, Hansell AL, et al., 2012, BaySTDetect: detecting unusual temporal patterns in small area data via Bayesian model choice, Biostatistics
Space–time modeling of small area data is often used in epidemiology for mapping chronic disease rates and by government statistical agencies for producing local estimates of, for example, unemployment or crime rates. Although there is typically a general temporal trend, which affects all areas similarly, abrupt changes may occur in a particular area, e.g. due to emergence of localized predictors/risk factor(s) or impact of a new policy. Detection of areas with “unusual” temporal patterns is therefore important as a screening tool for further investigations. In this paper, we propose BaySTDetect, a novel detection method for short-time series of small area data using Bayesian model choice between two competing space–time models. The first model is a multiplicative decomposition of the area effect and the temporal effect, assuming one common temporal pattern across the whole study region. The second model estimates the time trends independently for each area. For each area, the posterior probability of belonging to the common trend model is calculated, which is then used to classify the local time trend as unusual or not. Crucial to any detection method, we provide a Bayesian estimate of the false discovery rate (FDR). A comprehensive simulation study has demonstrated the consistent good performance of BaySTDetect in detecting various realistic departure patterns in addition to estimating well the FDR. The proposed method is applied retrospectively to mortality data on chronic obstructive pulmonary disease (COPD) in England and Wales between 1990 and 1997 (a) to test a hypothesis that a government policy increased the diagnosis of COPD and (b) to perform surveillance. While results showed no evidence supporting the hypothesis regarding the policy, an identified unusual district (Tower Hamlets in inner London) was later recognized to have higher than national rates of hospital readmission and mortality due to COPD by the National Health Service
McCandless LC, Gustafson P, Levy AR, et al., 2012, Hierarchical priors for bias parameters in Bayesian sensitivity analysis for unmeasured confounding, STATISTICS IN MEDICINE, Vol: 31, Pages: 383-396, ISSN: 0277-6715
- Author Web Link
- Cite
- Citations: 13
McCandless L, Richardson S, Best N, 2012, Adjustment for Missing Confounders Using External Validation Data and Propensity Scores, Journal of the American Statistical Association
Reducing bias from missing confounders is a challenging problem in the analysis of observational data. Information about missing variables is sometimes available from external validation data, such as surveys or secondary samples drawn from the same source population. In principle, the validation data permits us to recover information about the missing data, but the difficulty is in eliciting a valid model for the nuisance distribution of the missing confounders. Motivated by a British study of the effects of trihalomethane exposure on risk of full-term low birthweight, we describe a flexible Bayesian procedure for adjusting for a vector of missing confounders using external validation data. We summarize the missing confounders with a scalar summary score using the propensity score methodology of Rosenbaum and Rubin. The score has the property that it induces conditional independence between the exposure and the missing confounders given the measured confounders. It balances the unmeasured confounders across exposure groups, within levels of measured covariates. To adjust for bias, we need only model and adjust for the summary score during Markov chain Monte Carlo computation. Simulation results illustrate that the proposed method reduces bias from several missing confounders over a range of different sample sizes for the validation data.
Clark SJ, Falchi M, Olsson B, et al., 2012, Association of Sirtuin 1 (SIRT1) Gene SNPs and Transcript Expression Levels With Severe Obesity, OBESITY, Vol: 20, Pages: 178-185, ISSN: 1930-7381
- Author Web Link
- Cite
- Citations: 55
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.