71 results found
Faria NR, Mellan TA, Whittaker C, et al., 2021, Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil, SCIENCE, Vol: 372, Pages: 815-+, ISSN: 0036-8075
Volz E, Mishra S, Chand M, et al., 2021, Assessing transmissibility of SARS-CoV-2 lineage B.1.1.7 in England, Nature, Vol: 593, Pages: 266-269, ISSN: 0028-0836
The SARS-CoV-2 lineage B.1.1.7, designated a Variant of Concern 202012/01 (VOC) by Public Health England1, originated in the UK in late Summer to early Autumn 20202. Whole genome SARS-CoV-2 sequence data collected from community-based diagnostic testing shows an unprecedentedly rapid expansion of the B.1.1.7 lineage during Autumn 2020, suggesting a selective advantage. We find that changes in VOC frequency inferred from genetic data correspond closely to changes inferred by S-gene target failures (SGTF) in community-based diagnostic PCR testing. Analysis of trends in SGTF and non-SGTF case numbers in local areas across England shows that the VOC has higher transmissibility than non-VOC lineages, even if the VOC has a different latent period or generation time. The SGTF data indicate a transient shift in the age composition of reported cases, with a larger share of under 20 year olds among reported VOC than non-VOC cases. Time-varying reproduction numbers for the VOC and cocirculating lineages were estimated using SGTF and genomic data. The best supported models did not indicate a substantial difference in VOC transmissibility among different age groups. There is a consensus among all analyses that the VOC has a substantial transmission advantage with a 50% to 100% higher reproduction number.
Hilbers AP, Brayshaw DJ, Gandy A, 2021, Efficient quantification of the impact of demand and weather uncertainty in power system models, IEEE Transactions on Power Systems, Vol: 36, Pages: 1771-1779, ISSN: 0885-8950
This paper introduces a novel approach to quantify the effect of forwardpropagated demand and weather uncertainty on power system planning andoperation model outputs. Recent studies indicate that such samplinguncertainty, originating from demand and weather time series inputs, should notbe ignored. However, established uncertainty quantification approaches fail inthis context due to the computational resources and additional data requiredfor Monte Carlo-based analysis. The method introduced here quantifiesuncertainty on model outputs using a bootstrap scheme with shorter time seriesthan the original, enhancing computational efficiency and avoiding the need forany additional data. It both quantifies output uncertainty and determines thesample length required for desired confidence levels. Simulations performed ontwo generation and transmission expansion planning models and one unitcommitment and economic dispatch model illustrate the method's efficacy. A testis introduced allowing users to determine whether estimated uncertainty boundsare valid. The models, data and code applying the method are provided asopen-source software.
Laydon D, Mishra S, Hinsley W, et al., 2021, Modelling the impact of the Tier system on SARS-CoV-2 transmission in the UK between the first and second national lockdowns, BMJ Open, Vol: 11, ISSN: 2044-6055
Objective To measure the effects of the tier system on the COVID-19 pandemic in the UK between the first and second national lockdowns, before the emergence of the B.1.1.7 variant of concern.Design This is a modelling study combining estimates of real-time reproduction number Rt (derived from UK case, death and serological survey data) with publicly available data on regional non-pharmaceutical interventions. We fit a Bayesian hierarchical model with latent factors using these quantities to account for broader national trends in addition to subnational effects from tiers.Setting The UK at lower tier local authority (LTLA) level. 310 LTLAs were included in the analysis.Primary and secondary outcome measures Reduction in real-time reproduction number Rt.Results Nationally, transmission increased between July and late September, regional differences notwithstanding. Immediately prior to the introduction of the tier system, Rt averaged 1.3 (0.9–1.6) across LTLAs, but declined to an average of 1.1 (0.86–1.42) 2 weeks later. Decline in transmission was not solely attributable to tiers. Tier 1 had negligible effects. Tiers 2 and 3, respectively, reduced transmission by 6% (5%–7%) and 23% (21%–25%). 288 LTLAs (93%) would have begun to suppress their epidemics if every LTLA had gone into tier 3 by the second national lockdown, whereas only 90 (29%) did so in reality.Conclusions The relatively small effect sizes found in this analysis demonstrate that interventions at least as stringent as tier 3 are required to suppress transmission, especially considering more transmissible variants, at least until effective vaccination is widespread or much greater population immunity has amassed.
Unwin H, Mishra S, Bradley V, et al., 2020, State-level tracking of COVID-19 in the United States, Nature Communications, Vol: 11, Pages: 1-9, ISSN: 2041-1723
As of 1st June 2020, the US Centers for Disease Control and Prevention reported 104,232 confirmed or probable COVID-19-related deaths in the US. This was more than twice the number of deaths reported in the next most severely impacted country. We jointly model the US epidemic at the state-level, using publicly available deathdata within a Bayesian hierarchical semi-mechanistic framework. For each state, we estimate the number of individuals that have been infected, the number of individuals that are currently infectious and the time-varying reproduction number (the average number of secondary infections caused by an infected person). We use changes in mobility to capture the impact that non-pharmaceutical interventions and other behaviour changes have on therate of transmission of SARS-CoV-2. We estimate thatRtwas only below one in 23 states on 1st June. We also estimate that 3.7% [3.4%-4.0%] of the total population of the US had been infected, with wide variation between states, and approximately 0.01% of the population was infectious. We demonstrate good 3 week model forecasts of deaths with low error and good coverage of our credible intervals.
Mishra S, Scott J, Zhu H, et al., 2020, A COVID-19 Model for Local Authorities of the United Kingdom
<jats:title>Abstract</jats:title><jats:p>We propose a new framework to model the COVID-19 epidemic of the United Kingdom at the level of local authorities. The model fits within a general framework for semi-mechanistic Bayesian models of the epidemic, with some important innovations: we model the proportion of infections that result in reported deaths and cases as random variables. This is in contrast to standard frameworks that model the latent infection as a deterministic function of time varying reproduction number, <jats:italic>R</jats:italic><jats:sub><jats:italic>t</jats:italic></jats:sub>. The model is tailored and designed to be updated daily based on publicly available data. We envisage the model to be useful for now-casting and short-term projections of the epidemic as well as estimating historical trends. The model fits are available on a public website, <jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://imperialcollegelondon.github.io/covid19local">https://imperialcollegelondon.github.io/covid19local</jats:ext-link>. The model is currently being used by the Scottish government in their decisions on interventions within Scotland [1, issue 24 to now].</jats:p>
Okell LC, Verity R, Katzourakis A, et al., 2020, Host or pathogen-related factors in COVID-19 severity? Reply, LANCET, Vol: 396, Pages: 1397-1397, ISSN: 0140-6736
Monod M, Blenkinsop A, Xi X, et al., 2020, Report 32: Targeting interventions to age groups that sustain COVID-19 transmission in the United States, Pages: 1-32
Following inial declines, in mid 2020, a resurgence in transmission of novel coronavirus disease (COVID-19) has occurred in the United States and parts of Europe. Despite the wide implementaon of non-pharmaceucal inter-venons, it is sll not known how they are impacted by changing contact paerns, age and other demographics. As COVID-19 disease control becomes more localised, understanding the age demographics driving transmission and how these impact the loosening of intervenons such as school reopening is crucial. Considering dynamics for the United States, we analyse aggregated, age-speciﬁc mobility trends from more than 10 million individuals and link these mechaniscally to age-speciﬁc COVID-19 mortality data. In contrast to previous approaches, we link mobility to mortality via age speciﬁc contact paerns and use this rich relaonship to reconstruct accurate trans-mission dynamics. Contrary to anecdotal evidence, we ﬁnd lile support for age-shis in contact and transmission dynamics over me. We esmate that, unl August, 63.4% [60.9%-65.5%] of SARS-CoV-2 infecons in the United States originated from adults aged 20-49, while 1.2% [0.8%-1.8%] originated from children aged 0-9. In areas with connued, community-wide transmission, our transmission model predicts that re-opening kindergartens and el-ementary schools could facilitate spread and lead to considerable excess COVID-19 aributable deaths over a 90-day period. These ﬁndings indicate that targeng intervenons to adults aged 20-49 are an important con-sideraon in halng resurgent epidemics, and prevenng COVID-19-aributable deaths when kindergartens and elementary schools reopen.
Ding D, Gandy A, Hahn G, 2020, A simple method for implementing Monte Carlo tests, Computational Statistics, Vol: 35, Pages: 1373-1392, ISSN: 0943-4062
We consider a statistical test whose p value can only be approximated using Monte Carlo simulations. We are interested in deciding whether the p value for an observed data set lies above or below a given threshold such as 5%. We want to ensure that the resampling risk, the probability of the (Monte Carlo) decision being different from the true decision, is uniformly bounded. This article introduces a simple open-ended method with this property, the confidence sequence method (CSM). We compare our approach to another algorithm, SIMCTEST, which also guarantees an (asymptotic) uniform bound on the resampling risk, as well as to other Monte Carlo procedures without a uniform bound. CSM is free of tuning parameters and conservative. It has the same theoretical guarantee as SIMCTEST and, in many settings, similar stopping boundaries. As it is much simpler than other methods, CSM is a useful method for practical applications.
Hilbers A, Brayshaw D, Gandy A, 2020, Importance subsampling for power system planning under multi-year demand and weather uncertainty, PMAPS 2020 (the 16th International Conference on Probabilistic Methods Applied to Power Systems), Publisher: IEEE, Pages: 1-6
This paper introduces a generalised version ofimportance subsamplingfor time series reduction/aggregation inoptimisation-based power system planning models. Recent studiesindicate that reliably determining optimal electricity (investment)strategy under climate variability requires the consideration ofmultiple years of demand and weather data. However, solvingplanning models over long simulation lengths is typically com-putationally unfeasible, and established time series reductionapproaches induce significant errors. Theimportance subsamplingmethod reliably estimates long-term planning model outputs atgreatly reduced computational cost, allowing the considerationof multi-decadal samples. The key innovation is a systematicidentification and preservation of relevant extreme events inmodeling subsamples. Simulation studies on generation andtransmission expansion planning models illustrate the method’senhanced performance over established “representative days”clustering approaches. The models, data and sample code aremade available as open-source software.
Flaxman S, Mishra S, Gandy A, et al., 2020, Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe, Nature, Vol: 584, Pages: 257-261, ISSN: 0028-0836
Following the emergence of a novel coronavirus1 (SARS-CoV-2) and its spread outside of China, Europe has experienced large epidemics. In response, many European countries have implemented unprecedented non-pharmaceutical interventions such as closure of schools and national lockdowns. We study the impact of major interventions across 11 European countries for the period from the start of COVID-19 until the 4th of May 2020 when lockdowns started to be lifted. Our model calculates backwards from observed deaths to estimate transmission that occurred several weeks prior, allowing for the time lag between infection and death. We use partial pooling of information between countries with both individual and shared effects on the reproduction number. Pooling allows more information to be used, helps overcome data idiosyncrasies, and enables more timely estimates. Our model relies on fixed estimates of some epidemiological parameters such as the infection fatality rate, does not include importation or subnational variation and assumes that changes in the reproduction number are an immediate response to interventions rather than gradual changes in behavior. Amidst the ongoing pandemic, we rely on death data that is incomplete, with systematic biases in reporting, and subject to future consolidation. We estimate that, for all the countries we consider, current interventions have been sufficient to drive the reproduction number Rt below 1 (probability Rt< 1.0 is 99.9%) and achieve epidemic control. We estimate that, across all 11 countries, between 12 and 15 million individuals have been infected with SARS-CoV-2 up to 4th May, representing between 3.2% and 4.0% of the population. Our results show that major non-pharmaceutical interventions and lockdown in particular have had a large effect on reducing transmission. Continued intervention should be considered to keep transmission of SARS-CoV-2 under control.
Okell LC, Verity R, Watson OJ, et al., 2020, Have deaths from COVID-19 in Europe plateaued due to herd immunity?, LANCET, Vol: 395, Pages: E110-E111, ISSN: 0140-6736
Gandy A, Veraart LAM, 2020, Compound poisson models for weighted networks with applications in finance, Mathematics and Financial Economics, Vol: 15, Pages: 131-153, ISSN: 1862-9660
We develop a modelling framework for estimating and predicting weighted network data. Theedge weights in weighted networks often arise from aggregating some individual relationships between the nodes. Motivated by this, we introduce a modelling framework for weighted networksbased on the compound Poisson distribution. To allow for heterogeneity between the nodes, weuse a regression approach for the model parameters. We test the new modelling framework on twotypes of financial networks: a network of financial institutions in which the edge weights representexposures from trading Credit Default Swaps and a network of countries in which the edge weightsrepresent cross-border lending. The compound Poisson Gamma distributions with regression fit thedata well in both situations. We illustrate how this modelling framework can be used for predictingunobserved edges and their weights in an only partially observed network. This is for examplerelevant for assessing systemic risk in financial networks.
Lamprinakou S, McCoy E, Barahona M, et al., 2020, BART-based inference for Poisson processes
The effectiveness of Bayesian Additive Regression Trees (BART) has beendemonstrated in a variety of contexts including non parametric regression andclassification. Here we introduce a BART scheme for estimating the intensity ofinhomogeneous Poisson Processes. Poisson intensity estimation is a vital taskin various applications including medical imaging, astrophysics and networktraffic analysis. Our approach enables full posterior inference of theintensity in a nonparametric regression setting. We demonstrate the performanceof our scheme through simulation studies on synthetic and real datasets in oneand two dimensions, and compare our approach to alternative approaches.
Scott J, Gandy A, 2020, State-dependent Kernel selection for conditional sampling of graphs, Journal of Computational and Graphical Statistics, Vol: 29, Pages: 847-858, ISSN: 1061-8600
This article introduces new efficient algorithms for two problems: sampling conditional on vertex degrees in unweighted graphs, and conditional on vertex strengths in weighted graphs. The resulting conditional distributions provide the basis for exact tests on social networks and two-way contingency tables. The algorithms are able to sample conditional on the presence or absence of an arbitrary set of edges. Existing samplers based on MCMC or sequential importance sampling are generally not scalable; their efficiency can degrade in large graphs with complex patterns of known edges. MCMC methods usually require explicit computation of a Markov basis to navigate the state space; this is computationally intensive even for small graphs. Our samplers do not require a Markov basis, and are efficient both in sparse and dense settings. The key idea is to carefully select a Markov kernel on the basis of the current state of the chain. We demonstrate the utility of our methods on a real network and contingency table. Supplementary materials for this article are available online.
Mellan T, Hoeltgebaum H, Mishra S, et al., 2020, Report 21: Estimating COVID-19 cases and reproduction number in Brazil
Brazil is an epicentre for COVID-19 in Latin America. In this report we describe the Brazilian epidemicusing three epidemiological measures: the number of infections, the number of deaths and the reproduction number. Our modelling framework requires sufficient death data to estimate trends, and wetherefore limit our analysis to 16 states that have experienced a total of more than fifty deaths. Thedistribution of deaths among states is highly heterogeneous, with 5 states—São Paulo, Rio de Janeiro,Ceará, Pernambuco and Amazonas—accounting for 81% of deaths reported to date. In these states, weestimate that the percentage of people that have been infected with SARS-CoV-2 ranges from 3.3% (95%CI: 2.8%-3.7%) in São Paulo to 10.6% (95% CI: 8.8%-12.1%) in Amazonas. The reproduction number (ameasure of transmission intensity) at the start of the epidemic meant that an infected individual wouldinfect three or four others on average. Following non-pharmaceutical interventions such as school closures and decreases in population mobility, we show that the reproduction number has dropped substantially in each state. However, for all 16 states we study, we estimate with high confidence that thereproduction number remains above 1. A reproduction number above 1 means that the epidemic isnot yet controlled and will continue to grow. These trends are in stark contrast to other major COVID19 epidemics in Europe and Asia where enforced lockdowns have successfully driven the reproductionnumber below 1. While the Brazilian epidemic is still relatively nascent on a national scale, our resultssuggest that further action is needed to limit spread and prevent health system overload.
Vollmer M, Mishra S, Unwin H, et al., 2020, Report 20: A sub-national analysis of the rate of transmission of Covid-19 in Italy
Italy was the first European country to experience sustained local transmission of COVID-19. As of 1st May 2020, the Italian health authorities reported 28; 238 deaths nationally. To control the epidemic, the Italian government implemented a suite of non-pharmaceutical interventions (NPIs), including school and university closures, social distancing and full lockdown involving banning of public gatherings and non essential movement. In this report, we model the effect of NPIs on transmission using data on average mobility. We estimate that the average reproduction number (a measure of transmission intensity) is currently below one for all Italian regions, and significantly so for the majority of the regions. Despite the large number of deaths, the proportion of population that has been infected by SARS-CoV-2 (the attack rate) is far from the herd immunity threshold in all Italian regions, with the highest attack rate observed in Lombardy (13.18% [10.66%-16.70%]). Italy is set to relax the currently implemented NPIs from 4th May 2020. Given the control achieved by NPIs, we consider three scenarios for the next 8 weeks: a scenario in which mobility remains the same as during the lockdown, a scenario in which mobility returns to pre-lockdown levels by 20%, and a scenario in which mobility returns to pre-lockdown levels by 40%. The scenarios explored assume that mobility is scaled evenly across all dimensions, that behaviour stays the same as before NPIs were implemented, that no pharmaceutical interventions are introduced, and it does not include transmission reduction from contact tracing, testing and the isolation of confirmed or suspected cases. We find that, in the absence of additional interventions, even a 20% return to pre-lockdown mobility could lead to a resurgence in the number of deaths far greater than experienced in the current wave in several regions. Future increases in the number of deaths will lag behind the increase in transmission intensity and so a
Hawryluk I, Mellan TA, Hoeltgebaum H, et al., 2020, Inference of COVID-19 epidemiological distributions from Brazilian hospital data, Journal of The Royal Society Interface, Vol: 17, Pages: 20200596-20200596, ISSN: 1742-5662
Knowing COVID-19 epidemiological distributions, such as the time from patient admission to death, is directly relevant to effective primary and secondary care planning, and moreover, the mathematical modelling of the pandemic generally. We determine epidemiological distributions for patients hospitalized with COVID-19 using a large dataset (N = 21 000 − 157 000) from the Brazilian Sistema de Informação de Vigilância Epidemiológica da Gripe database. A joint Bayesian subnational model with partial pooling is used to simultaneously describe the 26 states and one federal district of Brazil, and shows significant variation in the mean of the symptom-onset-to-death time, with ranges between 11.2 and 17.8 days across the different states, and a mean of 15.2 days for Brazil. We find strong evidence in favour of specific probability density function choices: for example, the gamma distribution gives the best fit for onset-to-death and the generalized lognormal for onset-to-hospital-admission. Our results show that epidemiological distributions have considerable geographical variation, and provide the first estimates of these distributions in a low and middle-income setting. At the subnational level, variation in COVID-19 outcome timings are found to be correlated with poverty, deprivation and segregation levels, and weaker correlation is observed for mean age, wealth and urbanicity.
Gandy A, Hahn G, Ding D, 2019, Implementing Monte Carlo tests with p-value buckets, SCANDINAVIAN JOURNAL OF STATISTICS, Vol: 47, Pages: 950-967, ISSN: 0303-6898
Jin S, Savioli N, Marvao AD, et al., 2019, Joint analysis of clinical risk factors and 4D cardiac motion for survival prediction using a hybrid deep learning network, Publisher: arXiv
In this work, a novel approach is proposed for joint analysis of highdimensional time-resolved cardiac motion features obtained from segmentedcardiac MRI and low dimensional clinical risk factors to improve survivalprediction in heart failure. Different methods are evaluated to find theoptimal way to insert conventional covariates into deep prediction networks.Correlation analysis between autoencoder latent codes and covariate features isused to examine how these predictors interact. We believe that similarapproaches could also be used to introduce knowledge of genetic variants tosuch survival networks to improve outcome prediction by jointly analysingcardiac motion traits with inheritable risk factors.
Hilbers A, Brayshaw D, Gandy A, 2019, Importance subsampling: Improving power system planning under climate-based uncertainty, Applied Energy, Vol: 251, Pages: 1-12, ISSN: 0306-2619
Recent studies indicate that the effects of inter-annual climate-based variability in power system planning are significant and that long samples of demand & weather data (spanning multiple decades) should be considered. At the same time, modelling renewable generation such as solar and wind requires high temporal resolution to capture fluctuations in output levels. In many realistic power system models, using long samples at high temporal resolution is computationally unfeasible. This paper introduces a novel subsampling approach, referred to as importance subsampling, allowing the use of multiple decades of demand & weather data in power system planning models at reduced computational cost. The methodology can be applied in a wide class of optimisation-based power system simulations. A test case is performed on a model of the United Kingdom created using the open-source modelling framework Calliope and 36 years of hourly demand and wind data. Standard data reduction approaches such as using individual years or clustering into representative days lead to significant errors in estimates of optimal system design. Furthermore, the resultant power systems lead to supply capacity shortages, raising questions of generation capacity adequacy. In contrast, importance subsampling leads to accurate estimates of optimal system design at greatly reduced computational cost, with resultant power systems able to meet demand across all 36 years of demand & weather scenarios.
Veraart LAM, Gandy A, 2019, Adjustable network reconstruction with applications to CDS exposures, Journal of Multivariate Analysis, Vol: 172, Pages: 193-209, ISSN: 0047-259X
This paper is concerned with reconstructing weighted directed networks from the total in- and out-weight of each node. This problem arises for example in the analysis of systemic risk of partially observed financial networks. Typically a wide range of networks is consistent with this partial information. We develop an empirical Bayesian methodology that can be adjusted such that the resulting networks are consistent with the observations and satisfy certain desired global topological properties such as a given mean density, extending the approach by Gandy and Veraart (2017). Furthermore we propose a new fitness-based model within this framework. We provide a case study based on a data set consisting of 89 fully observed financial networks of credit default swap exposures. We reconstruct those networks based on only partial information using the newly proposed as well as existing methods. To assess the quality of the reconstruction, we use a wide range of criteria, including measures on how well the degree distribution can be captured and higher order measures of systemic risk. We find that the empirical Bayesian approach performs best.
Noven R, Veraart A, Gandy A, 2018, A latent trawl process model for extreme values, Journal of Energy Markets, Vol: 11, Pages: 1-24, ISSN: 1756-3607
This paper presents a new model for characterising temporaldependence in exceedancesabove a threshold. The model is based on the class of trawl processes, which are stationary,infinitely divisible stochastic processes. The model for extreme values is constructed byembedding a trawl process in a hierarchical framework, which ensures that the marginaldistribution is generalised Pareto, as expected from classical extreme value theory. Wealso consider a modified version of this model that works witha wider class of generalisedPareto distributions, and has the advantage of separating marginal and temporal depen-dence properties. The model is illustrated by applicationsto environmental time series,and it is shown that the model offers considerable flexibilityin capturing the dependencestructure of extreme value data
Gandy A, Veraart LAM, 2017, A Bayesian methodology for systemic risk assessment in financial networks, Management Science, Vol: 63, Pages: 4428-4446, ISSN: 0025-1909
We develop a Bayesian methodology for systemic risk assessment in financial networks such as theinterbank market. Nodes represent participants in the network and weighted directed edges representliabilities. Often, for every participant, only the total liabilities and total assets within this network areobservable. However, systemic risk assessment needs the individual liabilities. We propose a modelfor the individual liabilities, which, following a Bayesian approach, we then condition on the observedtotal liabilities and assets and, potentially, on certain observed individual liabilities. We construct aGibbs sampler to generate samples from this conditional distribution. These samples can be used instress testing, giving probabilities for the outcomes of interest. As one application we derive defaultprobabilities of individual banks and discuss their sensitivity with respect to prior information includedto model the network. An R-package implementing the methodology is provided.
Gandy A, Kvaløy JT, 2017, spcadjust: an R package for adjusting for estimation error in control charts, The R Journal, Vol: 9, Pages: 458-476, ISSN: 2073-4859
In practical applications of control charts the in-control state and the corresponding chartparameters are usually estimated based on some past in-control data. The estimation error thenneeds to be accounted for. In this paper we present an R package,spcadjust, which implements abootstrap based method for adjusting monitoring schemes to take into account the estimation error.By bootstrapping the past data this method guarantees, with a certain probability, a conditionalperformance of the chart. Inspcadjustthe method is implement for various types of Shewhart,CUSUM and EWMA charts, various performance criteria, and both parametric and non-parametricbootstrap schemes. In addition to the basic charts, charts based on linear and logistic regressionmodels for risk adjusted monitoring are included, and it is easy for the user to add further charts. Useof the package is demonstrated by examples.
Lau FDH, Gandy A, 2016, Enhancing football league tables, Significance, Vol: 13, Pages: 8-9, ISSN: 1740-9705
League tables are commonly used to represent the current state of a competition, in football and other sports. But they do not tell the full story. F. Din-Houn Lau and Axel Gandy suggest a few improvements.
Gandy A, Lau F, 2016, The chopthin algorithm for resampling, IEEE Transactions on Signal Processing, Vol: 64, Pages: 4273-4281, ISSN: 1941-0476
Resampling is a standard step in particle filters and more generally sequential Monte Carlo methods. We present an algorithm, called chopthin, for resampling weighted particles. In contrast to standard resampling methods the algorithm does not produce a set of equally weighted particles; instead it merely enforces an upper bound on the ratio between the weights. Simulation studies show that the chopthin algorithm consistently outperforms standard resampling methods. The algorithms chops up particles with large weight and thins out particles with low weight, hence its name. It implicitly guarantees a lower bound on the effective sample size. The algorithm can be implemented efficiently, making it practically useful. We show that the expected computational effort is linear in the number of particles. Implementations for C++, R (on CRAN), Python and Matlab are available.
Gandy A, Hahn G, 2016, QuickMMCTest -- quick multiple Monte Carlo testing, Statistics and Computing, Vol: 27, Pages: 823-832, ISSN: 1573-1375
Multiple hypothesis testing is widely used to evaluate scientific studiesinvolving statistical tests. However, for many of these tests, p-values are notavailable and are thus often approximated using Monte Carlo tests such aspermutation tests or bootstrap tests. This article presents a simple algorithmbased on Thompson Sampling to test multiple hypotheses. It works with arbitrarymultiple testing procedures, in particular with step-up and step-downprocedures. Its main feature is to sequentially allocate Monte Carlo effort,generating more Monte Carlo samples for tests whose decisions are so far lesscertain. A simulation study demonstrates that for a low computational effort,the new approach yields a higher power and a higher degree of reproducibilityof its results than previously suggested methods.
Gandy A, Hahn G, 2016, A framework for Monte Carlo based multiple testing, Scandinavian Journal of Statistics, Vol: 43, Pages: 1046-1063, ISSN: 1467-9469
We are concerned with multiple testing in the setting where p-values areunknown and can only be approximated using Monte Carlo simulation. Thisscenario occurs widely in practice. We are interested in obtaining the samerejections and non-rejections as the ones obtained if the p-values for allhypotheses had been available. The present article introduces a framework forthis scenario by providing a generic algorithm for a general multiple testingprocedure. We establish conditions which guarantee that the rejections andnon-rejections obtained through Monte Carlo simulations are identical to theones obtained with the p-values. Our framework is applicable to a general classof step-up and step-down procedures which includes many established multipletesting corrections such as the ones of Bonferroni, Holm, Sidak, Hochberg orBenjamini-Hochberg. Moreover, we show how to use our framework to improvealgorithms available in the literature in such a way as to yield theoreticalguarantees on their results. These modifications can easily be implemented inpractice and lead to a particular way of reporting multiple testing results asthree sets together with an error bound on their correctness, demonstratedexemplarily using a real biological dataset.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.