Publications
45 results found
Odgers J, Kappatou C, Misener R, et al., 2023, Probabilistic predictions for partial least squares using bootstrap, AIChE Journal, Vol: 69, Pages: 1-16, ISSN: 0001-1541
Modeling the uncertainty in partial least squares (PLS) is made difficult because of the nonlinear effect of the observed data on the latent space that the method finds. We present an approach, based on bootstrapping, that automatically accounts for these nonlinearities in the parameter uncertainty, allowing us to equally well represent confidence intervals for points lying close to or far away from the latent space. To show the opportunities of this approach, we develop applications in determining the Design Space for industrial processes and model the uncertainty of spectroscopy data. Our results show the benefits of our method for accounting for uncertainty far from the latent space for the purposes of Design Space identification, and match the performance of well established methods for spectroscopy data.
Howson B, Pike-Burke C, Filippi S, 2023, Delayed feedback in generalised linear bandits revisited, Artificial Intelligence and Statistics s (AISTATS 2023), Publisher: PMLR, Pages: 1-25, ISSN: 2640-3498
The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for immediate rewards is unmet in many real-world applications where the reward is almost always delayed. We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner. We show that a natural adaptation of an optimistic algorithm to the delayed feedback setting can achieve regret of ̃O(d√T + d3/2E[τ ] ), where E[τ ] denotes the expected delay, d is the dimension and T is the time horizon. This significantly improves upon existing approaches for this setting where the best known regret bound was ̃O(√dT √d + E[τ ] ). We verify our theoretical results through experiments on simulated data.
Howson B, Pike-Burke C, Filippi S, 2023, Optimism and delays in episodic reinforcement learning, Artificial Intelligence and Statistics (AISTATS 2023), Publisher: PMLR, Pages: 1-34
There are many algorithms for regret minimisation in episodic reinforcement learning. This problem is well-understood from a theoretical perspective, providing that the sequences of states, actions and rewards associated with each episode are available to the algorithm updatingthe policy immediately after every interaction with the environment. However, feedback is almost always delayed in practice. In this paper, we study the impact of delayed feedback in episodic reinforcement learning from a theoretical perspective and propose two general-purposeapproaches to handling the delays. The first involves updating as soon as new information becomes available, whereas the second waits before using newly observed information to update the policy. For the class of optimistic algorithms and either approach, we show that the regret in-creases by an additive term involving the number of states, actions, episode length, the expected delay and an algorithm-dependent constant. We empirically investigate the impact of various delay distributions on the regret of optimistic algorithms to validate our theoretical results.
Lamprinakou S, Barahona M, Flaxman S, et al., 2023, BART-based inference for Poisson processes, Computational Statistics and Data Analysis, Vol: 180, Pages: 1-25, ISSN: 0167-9473
The effectiveness of Bayesian Additive Regression Trees (BART) has been demonstrated in a variety of contexts including non-parametric regression and classification. A BART scheme for estimating the intensity of inhomogeneous Poisson processes is introduced. Poisson intensity estimation is a vital task in various applications including medical imaging, astrophysics and network traffic analysis. The new approach enables full posterior inference of the intensity in a non-parametric regression setting. The performance of the novel scheme is demonstrated through simulation studies on synthetic and real datasets up to five dimensions, and the new scheme is compared with alternative approaches.
Zhang Q, Wild V, Filippi S, et al., 2022, Bayesian kernel two-sample testing, Journal of Computational and Graphical Statistics, Vol: 31, Pages: 1164-1176, ISSN: 1061-8600
In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where applications are often restricted to univariate cases. Here, we propose a Bayesian kernel two-sample testing procedure based on modeling the difference between kernel mean embeddings in the reproducing kernel Hilbert space using the framework established by Flaxman et al. The use of kernel methods enables its application to random variables in generic domains beyond the multivariate Euclidean spaces. The proposed procedure results in a posterior inference scheme that allows an automatic selection of the kernel parameters relevant to the problem at hand. In a series of synthetic experiments and two real data experiments (i.e., testing network heterogeneity from high-dimensional data and six-membered monocyclic ring conformation comparison), we illustrate the advantages of our approach. Supplementary materials for this article are available online.
Komodromos M, Aboagye EO, Evangelou M, et al., 2022, Variational Bayes for high-dimensional proportional hazards models with applications within gene expression, BIOINFORMATICS, Vol: 38, Pages: 3918-3926, ISSN: 1367-4803
Motivation:Few Bayesian methods for analyzing high-dimensional sparse survival data provide scalable variable selection, effect estimation and uncertainty quantification. Such methods often either sacrifice uncertainty quantification by computing maximum a posteriori estimates, or quantify the uncertainty at high (unscalable) computational expense.Results:We bridge this gap and develop an interpretable and scalable Bayesian proportional hazards model for prediction and variable selection, referred to as SVB. Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC whilst retaining useful features, providing a posterior distribution for the parameters and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, and how the uncertainty quantification offered by our method can be used to provide an interpretable assessment of patient risk.Availability and implementation:our method has been implemented as a freely available R package survival.svb (https://github.com/mkomod/survival.svb).
Reiker T, Golumbeanu M, Shattock A, et al., 2021, Emulator-based Bayesian optimization for efficient multi-objective calibration of an individual-based model of malaria, Nature Communications, Vol: 12, ISSN: 2041-1723
Individual-based models have become important tools in the global battle against infectious diseases, yet model complexity can make calibration to biological and epidemiological data challenging. We propose using a Bayesian optimization framework employing Gaussian process or machine learning emulator functions to calibrate a complex malaria transmission simulator. We demonstrate our approach by optimizing over a high-dimensional parameter space with respect to a portfolio of multiple fitting objectives built from datasets capturing the natural history of malaria transmission and disease progression. Our approach quickly outperforms previous calibrations, yielding an improved final goodness of fit. Per-objective parameter importance and sensitivity diagnostics provided by our approach offer epidemiological insights and enhance trust in predictions through greater interpretability.
Frainay C, Pitarch Y, Filippi S, et al., 2021, Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining, Clinical and Experimental Allergy, Vol: 51, Pages: 1185-1194, ISSN: 0954-7894
BackgroundBiomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications.ObjectiveTo investigate the consequence of the ambiguity between the use of terms “Eczema” and “Atopic Dermatitis” (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining.MethodsArticles were retrieved by querying the PubMed using terms ‘eczema’ (D003876) and “dermatitis, atopic” (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used.ResultsAtopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with “AD” or “Eczema” differed, with an agreement between the top 40 lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query.Conclusions and Clinical RelevanceThere is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning
NCD Risk Factor Collaboration NCD-RisC, Iurilli N, 2021, Heterogeneous contributions of change in population distribution of body-mass index to change in obesity and underweight, eLife, Vol: 10, ISSN: 2050-084X
From 1985 to 2016, the prevalence of underweight decreased, and that of obesity and severe obesity increased, in most regions, with significant variation in the magnitude of these changes across regions. We investigated how much change in mean body mass index (BMI) explains changes in the prevalence of underweight, obesity, and severe obesity in different regions using data from 2896 population-based studies with 187 million participants. Changes in the prevalence of underweight and total obesity, and to a lesser extent severe obesity, are largely driven by shifts in the distribution of BMI, with smaller contributions from changes in the shape of the distribution. In East and Southeast Asia and sub-Saharan Africa, the underweight tail of the BMI distribution was left behind as the distribution shifted. There is a need for policies that address all forms of malnutrition by making healthy foods accessible and affordable, while restricting unhealthy foods through fiscal and regulatory restrictions.
Unwin H, Mishra S, Bradley V, et al., 2020, State-level tracking of COVID-19 in the United States, Nature Communications, Vol: 11, Pages: 1-9, ISSN: 2041-1723
As of 1st June 2020, the US Centers for Disease Control and Prevention reported 104,232 confirmed or probable COVID-19-related deaths in the US. This was more than twice the number of deaths reported in the next most severely impacted country. We jointly model the US epidemic at the state-level, using publicly available deathdata within a Bayesian hierarchical semi-mechanistic framework. For each state, we estimate the number of individuals that have been infected, the number of individuals that are currently infectious and the time-varying reproduction number (the average number of secondary infections caused by an infected person). We use changes in mobility to capture the impact that non-pharmaceutical interventions and other behaviour changes have on therate of transmission of SARS-CoV-2. We estimate thatRtwas only below one in 23 states on 1st June. We also estimate that 3.7% [3.4%-4.0%] of the total population of the US had been infected, with wide variation between states, and approximately 0.01% of the population was infectious. We demonstrate good 3 week model forecasts of deaths with low error and good coverage of our credible intervals.
Kolbeinsson A, Filippi S, Panagakis I, et al., 2020, Accelerated MRI-predicted brain ageing and its associations with cardiometabolic and brain disorders, Scientific Reports, Vol: 10, ISSN: 2045-2322
Brain structure in later life reflects both influences of intrinsic aging and those of lifestyle, environment and disease. We developed a deep neural network model trained on brain MRI scans of healthy people to predict “healthy” brain age. Brain regions most informative for the prediction included the cerebellum, hippocampus, amygdala and insular cortex. We then applied this model to data from an independent group of people not stratified for health. A phenome-wide association analysis of over 1,410 traits in the UK Biobank with differences between the predicted and chronological ages for the second group identified significant associations with over 40 traits including diseases (e.g., type I and type II diabetes), disease risk factors (e.g., increased diastolic blood pressure and body mass index), and poorer cognitive function. These observations highlight relationships between brain and systemic health and have implications for understanding contributions of the latter to late life dementia risk.
Roberts G, Fontanella S, Selby A, et al., 2020, Connectivity patterns between multiple allergen specific IgE antibodies and their association with severe asthma, Journal of Allergy and Clinical Immunology, Vol: 146, Pages: 821-830, ISSN: 0091-6749
BACKGROUND: Allergic sensitization is associated with severe asthma, but assessment of sensitization is not recommended by most guidelines. OBJECTIVE: We hypothesized that patterns of IgE responses to multiple allergenic proteins differ between sensitized participants with mild/moderate and severe asthma. METHODS: IgE to 112 allergenic molecules (components, c-sIgE) was measured using multiplex array among 509 adults and 140 school-age and 131 preschool children with asthma/wheeze from the Unbiased BIOmarkers for the PREDiction of respiratory diseases outcomes cohort, of whom 595 had severe disease. We applied clustering methods to identify co-occurrence patterns of components (component clusters) and patterns of sensitization among participants (sensitization clusters). Network analysis techniques explored the connectivity structure of c-sIgE, and differential network analysis looked for differences in c-sIgE interactions between severe and mild/moderate asthma. RESULTS: Four sensitization clusters were identified, but with no difference between disease severity groups. Similarly, component clusters were not associated with asthma severity. None of the c-sIgE were identified as associates of severe asthma. The key difference between school children and adults with mild/moderate compared with those with severe asthma was in the network of connections between c-sIgE. Participants with severe asthma had higher connectivity among components, but these connections were weaker. The mild/moderate network had fewer connections, but the connections were stronger. Connectivity between components with no structural homology tended to co-occur among participants with severe asthma. Results were independent from the different sample sizes of mild/moderate and severe groups. CONCLUSIONS: The patterns of interactions between IgE to multiple allergenic proteins are predictors of asthma severity among school children and adults with allergic asthma.
Monod M, Blenkinsop A, Xi X, et al., 2020, Report 32: Targeting interventions to age groups that sustain COVID-19 transmission in the United States, Pages: 1-32
Following inial declines, in mid 2020, a resurgence in transmission of novel coronavirus disease (COVID-19) has occurred in the United States and parts of Europe. Despite the wide implementaon of non-pharmaceucal inter-venons, it is sll not known how they are impacted by changing contact paerns, age and other demographics. As COVID-19 disease control becomes more localised, understanding the age demographics driving transmission and how these impact the loosening of intervenons such as school reopening is crucial. Considering dynamics for the United States, we analyse aggregated, age-specific mobility trends from more than 10 million individuals and link these mechaniscally to age-specific COVID-19 mortality data. In contrast to previous approaches, we link mobility to mortality via age specific contact paerns and use this rich relaonship to reconstruct accurate trans-mission dynamics. Contrary to anecdotal evidence, we find lile support for age-shis in contact and transmission dynamics over me. We esmate that, unl August, 63.4% [60.9%-65.5%] of SARS-CoV-2 infecons in the United States originated from adults aged 20-49, while 1.2% [0.8%-1.8%] originated from children aged 0-9. In areas with connued, community-wide transmission, our transmission model predicts that re-opening kindergartens and el-ementary schools could facilitate spread and lead to considerable excess COVID-19 aributable deaths over a 90-day period. These findings indicate that targeng intervenons to adults aged 20-49 are an important con-sideraon in halng resurgent epidemics, and prevenng COVID-19-aributable deaths when kindergartens and elementary schools reopen.
Teymur O, Filippi S, 2020, A Bayesian nonparametric test for conditional independence, Foundations of Data Science, Vol: 2, Pages: 155-172, ISSN: 2639-8001
This article introduces a Bayesian nonparametric method for quantifying the relative evidence in a dataset in favour of the dependence or independence of two variables conditional on a third. The approach uses Pólya tree priors on spaces of conditional probability densities, accounting for uncertainty in the form of the underlying distributions in a nonparametric way. The Bayesian perspective provides an inherently symmetric probability measure of conditional dependence or independence, a feature particularly advantageous in causal discovery and not employed in existing procedures of this type.
Yeung E, McFann S, Marsh L, et al., 2020, Inference of multisite phosphorylation rate constants and their modulation by pathogenic mutations, Current Biology, ISSN: 0960-9822
Sonntag H-J, Filippi S, Pipis S, et al., 2019, Blood biomarkers of sensitization and asthma, Frontiers in Pediatrics, Vol: 7, ISSN: 2296-2360
Biomarkers are essential to determine different phenotypes of childhood asthma, andfor the prediction of response to treatments. In young preschool children with asthma,aeroallergen sensitization, and blood eosinophil count of 300/µL or greater may identifythose who can benefit from the daily use of inhaled corticosteroids (ICS). We proposethat every preschool child who is considered for ICS treatment should have these twofeatures measured as a minimum before a decision is made on the commencementof long-term preventive treatment. In practice, IgE-mediated sensitization should beconsidered as a quantifiable variable, i.e., we should use the titer of sIgE antibodies orthe size of skin prick test response. A number of other blood biomarkers may proveuseful (e.g., allergen-specific IgG/IgE antibody ratios amongst sensitized individuals,component-resolved diagnostics which measures sIgE response to a large number ofallergenic molecules, assessment of immune responses to viruses, level of serum CC16,etc.), but it remains unclear whether these can be translated into clinically useful tests.Going forward, a more integrated approach which takes into account multiple domainsof asthma, from the pattern of symptoms and blood biomarkers to genetic risk andlung function measures, is needed if we are to move toward a stratified approach toasthma management.
Jetka T, Nienałtowski K, Filippi S, et al., 2018, An information-theoretic framework for deciphering pleiotropic and noisy biochemical signaling, Nature Communications, Vol: 9, ISSN: 2041-1723
Many components of signaling pathways are functionally pleiotropic, and signaling responses are marked with substantial cell-to-cell heterogeneity. Therefore, biochemical descriptions of signaling require quantitative support to explain how complex stimuli (inputs) are encoded in distinct activities of pathways effectors (outputs). A unique perspective of information theory cannot be fully utilized due to lack of modeling tools that account for the complexity of biochemical signaling, specifically for multiple inputs and outputs. Here, we develop a modeling framework of information theory that allows for efficient analysis of models with multiple inputs and outputs; accounts for temporal dynamics of signaling; enables analysis of how signals flow through shared network components; and is not restricted by limited variability of responses. The framework allows us to explain how identity and quantity of type I and type III interferon variants could be recognized by cells despite activating the same signaling effectors.
Filippi SL, Muraro D, Parker A, et al., 2018, Chronic TNFα-driven injury delays cell migration to villi in the intestinal epithelium, Journal of the Royal Society Interface, Vol: 15, ISSN: 1742-5662
The intestinal epithelium is a single layer of cells which provides the first line of defence of the intestinal mucosa to bacterial infection. Cohesion of this physical barrier is supported by renewal of epithelial stem cells, residing in invaginations called crypts, and by crypt cell migration onto protrusions called villi; dysregulation of such mechanisms may render the gut susceptible to chronic inflammation. The impact that excessive or misplaced epithelial cell death may have on villus cell migration is currently unknown. We integrated cell-tracking methods with computational models to determine how epithelial homeostasis is affected by acute and chronic TNFα-driven epithelial cell death. Parameter inference reveals that acute inflammatory cell death has a transient effect on epithelial cell dynamics, whereas cell death caused by chronic elevated TNFα causes a delay in the accumulation of labelled cells onto the villus compared to the control. Such a delay may be reproduced by using a cell-based model to simulate the dynamics of each cell in a crypt–villus geometry, showing that a prolonged increase in cell death slows the migration of cells from the crypt to the villus. This investigation highlights which injuries (acute or chronic) may be regenerated and which cause disruption of healthy epithelial homeostasis.
Dony L, Mackerodt J, Ward S, et al., 2018, PEITH(Theta): perfecting experiments with information theory in Python with GPU support, Bioinformatics, Vol: 34, Pages: 1249-1250, ISSN: 1367-4803
MotivationDifferent experiments provide differing levels of information about a biological system. This makes it difficult, a priori, to select one of them beyond mere speculation and/or belief, especially when resources are limited. With the increasing diversity of experimental approaches and general advances in quantitative systems biology, methods that inform us about the information content that a given experiment carries about the question we want to answer, become crucial.ResultsPEITH(Θ) is a general purpose, Python framework for experimental design in systems biology. PEITH(Θ) uses Bayesian inference and information theory in order to derive which experiments are most informative in order to estimate all model parameters and/or perform model predictions.Availability and implementation: https://github.com/MichaelPHStumpf/Peitho
Filippi S, Holmes C, 2017, A Bayesian nonparametric approach to testing for dependence between random variables, Bayesian Analysis, Vol: 12, Pages: 919-938, ISSN: 1931-6690
Nonparametric and nonlinear measures of statistical dependence between pairsof random variables are important tools in modern data analysis. In particularthe emergence of large data sets can now support the relaxation of linearityassumptions implicit in traditional association scores such as correlation.Here we describe a Bayesian nonparametric procedure that leads to a tractable,explicit and analytic quantification of the relative evidence for dependence vsindependence. Our approach uses Polya tree priors on the space of probabilitymeasures which can then be embedded within a decision theoretic test fordependence. Polya tree priors can accommodate known uncertainty in the form ofthe underlying sampling distribution and provides an explicit posteriorprobability measure of both dependence and independence. Well known advantagesof having an explicit probability measure include: easy comparison of evidenceacross different studies; encoding prior information; quantifying changes independence across different experimental conditions, and; the integration ofresults within formal decision analysis.
Smith RCG, Stumpf PS, Ridden SJ, et al., 2017, The problem of measurement in cell biology: a tale of two alleles, European Biophysics Journal with Biophysics Letters, Vol: 46, Pages: S371-S371, ISSN: 0175-7571
Smith RCG, Stumpf PS, Ridden SJ, et al., 2017, Nanog fluctuations in embryonic stem cells highlight the problem of Measurement in cell biology, Biophysical Journal, Vol: 112, Pages: 2641-2652, ISSN: 1542-0086
A number of important pluripotency regulators, including the transcription factor Nanog, are observed to fluctuate stochastically in individual embryonic stem cells. By transiently priming cells for commitment to different lineages, these fluctuations are thought to be important to the maintenance of, and exit from, pluripotency. However, because temporal changes in intracellular protein abundances cannot be measured directly in live cells, fluctuations are typically assessed using genetically engineered reporter cell lines that produce a fluorescent signal as a proxy for protein expression. Here, using a combination of mathematical modeling and experiment, we show that there are unforeseen ways in which widely used reporter strategies can systematically disturb the dynamics they are intended to monitor, sometimes giving profoundly misleading results. In the case of Nanog, we show how genetic reporters can compromise the behavior of important pluripotency-sustaining positive feedback loops, and induce a bifurcation in the underlying dynamics that gives rise to heterogeneous Nanog expression patterns in reporter cell lines that are not representative of the wild-type. These findings help explain the range of published observations of Nanog variability and highlight the problem of measurement in live cells.
Zhang Q, Filippi SL, Flaxman S, et al., 2017, Feature-to-feature regression for a two-step conditional independence test, Uncertainty in Artificial Intelligence
The algorithms for causal discovery and morebroadly for learning the structure of graphicalmodels require well calibrated and consistentconditional independence (CI) tests. We revisitthe CI tests which are based on two-step proceduresand involve regression with subsequent(unconditional) independence test (RESIT) onregression residuals and investigate the assumptionsunder which these tests operate. In particular,we demonstrate that when going beyond simplefunctional relationships with additive noise,such tests can lead to an inflated number of falsediscoveries. We study the relationship of thesetests with those based on dependence measuresusing reproducing kernel Hilbert spaces (RKHS)and propose an extension of RESIT which usesRKHS-valued regression. The resulting test inheritsthe simple two-step testing procedure ofRESIT, while giving correct Type I control andcompetitive power. When used as a componentof the PC algorithm, the proposed test is morerobust to the case where hidden variables inducea switching behaviour in the associations presentin the data.
Zhang Q, Filippi S, Gretton A, et al., 2017, Large-Scale Kernel Methods for Independence Testing, Statistics and Computing, Vol: 28, Pages: 113-130, ISSN: 1573-1375
Representations of probability measures in reproducing kernel Hilbert spacesprovide a flexible framework for fully nonparametric hypothesis tests ofindependence, which can capture any type of departure from independence,including nonlinear associations and multivariate interactions. However, theseapproaches come with an at least quadratic computational cost in the number ofobservations, which can be prohibitive in many applications. Arguably, it isexactly in such large-scale datasets that capturing any type of dependence isof interest, so striking a favourable tradeoff between computational efficiencyand test performance for kernel independence tests would have a direct impacton their applicability in practice. In this contribution, we provide anextensive study of the use of large-scale kernel approximations in the contextof independence testing, contrasting block-based, Nystrom and random Fourierfeature approaches. Through a variety of synthetic data experiments, it isdemonstrated that our novel large scale methods give comparable performancewith existing methods whilst using significantly less computation time andmemory.
Wills QF, Mellado-Gomez E, Nolan R, et al., 2017, The nature and nurture of cell heterogeneity: accounting for macrophage gene-environment interactions with single-cell RNA-Seq., BMC Genomics, Vol: 18, ISSN: 1471-2164
BACKGROUND: Single-cell RNA-Seq can be a valuable and unbiased tool to dissect cellular heterogeneity, despite the transcriptome's limitations in describing higher functional phenotypes and protein events. Perhaps the most important shortfall with transcriptomic 'snapshots' of cell populations is that they risk being descriptive, only cataloging heterogeneity at one point in time, and without microenvironmental context. Studying the genetic ('nature') and environmental ('nurture') modifiers of heterogeneity, and how cell population dynamics unfold over time in response to these modifiers is key when studying highly plastic cells such as macrophages. RESULTS: We introduce the programmable Polaris™ microfluidic lab-on-chip for single-cell sequencing, which performs live-cell imaging while controlling for the culture microenvironment of each cell. Using gene-edited macrophages we demonstrate how previously unappreciated knockout effects of SAMHD1, such as an altered oxidative stress response, have a large paracrine signaling component. Furthermore, we demonstrate single-cell pathway enrichments for cell cycle arrest and APOBEC3G degradation, both associated with the oxidative stress response and altered proteostasis. Interestingly, SAMHD1 and APOBEC3G are both HIV-1 inhibitors ('restriction factors'), with no known co-regulation. CONCLUSION: As single-cell methods continue to mature, so will the ability to move beyond simple 'snapshots' of cell populations towards studying the determinants of population dynamics. By combining single-cell culture, live-cell imaging, and single-cell sequencing, we have demonstrated the ability to study cell phenotypes and microenvironmental influences. It's these microenvironmental components - ignored by standard single-cell workflows - that likely determine how macrophages, for example, react to inflammation and form treatment resistant HIV reservoirs.
Filippi S, Holmes CC, Nieto-Barajas LE, 2016, Scalable Bayesian nonparametric measures for exploring pairwise dependence via Dirichlet Process Mixtures, Electronic Journal of Statistics, Vol: 10, Pages: 3338-3354, ISSN: 1935-7524
In this article we propose novel Bayesian nonparametric methods using Dirichlet Process Mixture (DPM) models for detecting pairwise dependence between random variables while accounting for uncertainty in the form of the underlying distributions. A key criteria is that the procedures should scale to large data sets. In this regard we find that the formal calculation of the Bayes factor for a dependent-vs.-independent DPM joint probability measure is not feasible computationally. To address this we present Bayesian diagnostic measures for characterising evidence against a “null model” of pairwise independence. In simulation studies, as well as for a real data analysis, we show that our approach provides a useful tool for the exploratory nonparametric Bayesian analysis of large multivariate data sets.
Flaxman S, Sejdinovic D, Cunningham JP, et al., 2016, Bayesian Learning of Kernel Embeddings, UAI'16
Kernel methods are one of the mainstays of machine learning, but the problemof kernel learning remains challenging, with only a few heuristics and verylittle theory. This is of particular importance in methods based on estimationof kernel mean embeddings of probability measures. For characteristic kernels,which include most commonly used ones, the kernel mean embedding uniquelydetermines its probability measure, so it can be used to design a powerfulstatistical testing framework, which includes nonparametric two-sample andindependence tests. In practice, however, the performance of these tests can bevery sensitive to the choice of kernel and its lengthscale parameters. Toaddress this central issue, we propose a new probabilistic model for kernelmean embeddings, the Bayesian Kernel Embedding model, combining a Gaussianprocess prior over the Reproducing Kernel Hilbert Space containing the meanembedding with a conjugate likelihood function, thus yielding a closed formposterior over the mean embedding. The posterior mean of our model is closelyrelated to recently proposed shrinkage estimators for kernel mean embeddings,while the posterior uncertainty is a new, interesting feature with variouspossible applications. Critically for the purposes of kernel learning, ourmodel gives a simple, closed form marginal pseudolikelihood of the observeddata given the kernel hyperparameters. This marginal pseudolikelihood caneither be optimized to inform the hyperparameter choice or fully Bayesianinference can be used.
Filippi S, Barnes CP, Kirk PDW, et al., 2016, Robustness of MEK-ERK Dynamics and Origins of Cell-to-Cell Variability in MAPK Signaling, CellReports
Mahon SSM, Lenive O, Filippi S, et al., 2015, Information processing by simple molecular motifs and susceptibility to noise, Journal of The Royal Society Interface
Bhatnagar N, Perkins K, Filippi S, et al., 2014, Clinical and Hematologic Impact of Fetal and Perinatal Variables on Mutant GATA1 Clone Size in Neonates with Down Syndrome, BLOOD, Vol: 124, ISSN: 0006-4971
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.