## Search or filter publications

### Filter by type:

Filter by publication type

to

## Results

• Showing results for:
• Reset all filters

## Search results

• JOURNAL ARTICLE
Battey HS, Zhu Z, Fan J, Lu J, Liu Het al.,

### Distributed testing and estimation with statistical guarantees

, Annals of Statistics, ISSN: 0090-5364

This paper studies hypothesis testing and parameter estimation inthe context of the divide-and-conquer algorithm. In a unified likelihoodbased framework, we propose new test statistics and point estimatorsobtained by aggregating various statistics fromksubsamples of sizen/k, wherenis the sample size. In both low dimensional and sparsehigh dimensional settings, we address the important question of howlargekcan be, asngrows large, such that the loss of efficiency dueto the divide-and-conquer algorithm is negligible. In other words,the resulting estimators have the same inferential efficiencies andestimation rates as an oracle with access to the full sample. Thoroughnumerical results are provided to back up the theory.

• CONFERENCE PAPER
Evans LPG, adams N, anagnostopoulos C,

### When Does Active Learning Work?

, Advances in Intelligent Data Analysis XII, ISSN: 0302-9743
• CONFERENCE PAPER
Schon C, Adams NM, Evangelou M,

### Clustering and monitoring edge behaviour in enterprise network traffic

, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

This paper takes an unsupervised learning approachfor monitoring edge activity within an enterprise computernetwork. Using NetFlow records, features are gathered acrossthe active connections (edges) in 15-minute time windows.Then, edges are grouped into clusters using the k-meansalgorithm. This process is repeated over contiguous windows.A series of informative indicators are derived by examining therelationship of edges with the observed cluster structure. Thisleads to an intuitive method for monitoring network behaviourand a temporal description of edge behaviour at global andlocal levels.

• JOURNAL ARTICLE
Battey H, Battey HS, 2017,

### Eigen structure of a new class of covariance and inverse covariance matrices

, Bernoulli, Vol: 23, Pages: 3166-3177, ISSN: 1350-7265

There is a one to one mapping between a p dimensional strictly positive definite covariancematrix Σ and its matrix logarithm L. We exploit this relationship to study thestructure induced on Σ through a sparsity constraint on L. Consider L as a randommatrix generated through a basis expansion, with the support of the basis coefficientstaken as a simple random sample of size s = s∗from the index set [p(p + 1)/2] ={1, . . . , p(p + 1)/2}. We find that the expected number of non-unit eigenvalues of Σ, denotedE[|A|], is approximated with near perfect accuracy by the solution of the equation4p + p(p − 1)2(p + 1)hlog pp − d −d2p(p − d)i− s∗ = 0.Furthermore, the corresponding eigenvectors are shown to possess only p − |Ac| nonzeroentries. We use this result to elucidate the precise structure induced on Σ and Σ−1.We demonstrate that a positive definite symmetric matrix whose matrix logarithm issparse is significantly less sparse in the original domain. This finding has importantimplications in high dimensional statistics where it is important to exploit structure inorder to construct consistent estimators in non-trivial norms. An estimator exploitingthe structure of the proposed class is presented.

• JOURNAL ARTICLE

### Continuous monitoring for changepoints in data streams using adaptive estimation

, STATISTICS AND COMPUTING, Vol: 27, Pages: 1257-1270, ISSN: 0960-3174

Data streams are characterised by a potentially unending sequence of high-frequency observations which are subject to unknown temporal variation. Many modern streaming applications demand the capability to sequentially detect changes as soon as possible after they occur, while continuing to monitor the stream as it evolves. We refer to this problem as continuous monitoring. Sequential algorithms such as CUSUM, EWMA and their more sophisticated variants usually require a pair of parameters to be selected for practical application. However, the choice of parameter values is often based on the anticipated size of the changes and a given choice is unlikely to be optimal for the multiple change sizes which are likely to occur in a streaming data context. To address this critical issue, we introduce a changepoint detection framework based on adaptive forgetting factors that, instead of multiple control parameters, only requires a single parameter to be selected. Simulated results demonstrate that this framework has utility in a continuous monitoring setting. In particular, it reduces the burden of selecting parameters in advance. Moreover, the methodology is demonstrated on real data arising from Foreign Exchange markets.

• JOURNAL ARTICLE
Chandna S, Walden AT, Chandna S, Walden AT, Chandna S, Walden ATet al., 2017,

### A Frequency Domain Test for Propriety of Complex-Valued Vector Time Series

, IEEE TRANSACTIONS ON SIGNAL PROCESSING, Vol: 65, Pages: 1425-1436, ISSN: 1053-587X

This paper proposes a frequency domain approachto test the hypothesis that a stationary complex-valued vectortime series is proper, i.e., for testing whether the vector time seriesis uncorrelated with its complex conjugate. If the hypothesis isrejected, frequency bands causing the rejection will be identifiedand might usefully be related to known properties of the physicalprocesses. The test needs the associated spectral matrix whichcan be estimated by multitaper methods using, say,Ktapers.Standard asymptotic distributions for the test statistic are of nouse since they would requireK→∞,but, asKincreases so doesresolution bandwidth which causes spectral blurring. In manyanalysesKis necessarily kept small, and hence our efforts aredirected at practical and accurate methodology for hypothesistesting for smallK.Our generalized likelihood ratio statisticcombined with exact cumulant matching gives very accuraterejection percentages. We also prove that the statistic on whichthe test is based is comprised of canonical coherencies arisingfrom our complex-valued vector time series. Frequency specifictests are combined using multiple hypothesis testing to give anoverall test. Our methodology is demonstrated on ocean currentdata collected at different depths in the Labrador Sea. Overallthis work extends results on propriety testing for complex-valuedvectors to the complex-valued vector time series setting.

• JOURNAL ARTICLE
Cox DR, Battey HS, 2017,

### Large numbers of explanatory variables, a semi-descriptive analysis

, Proceedings of the National Academy of Sciences of the United States of America, Vol: 32, Pages: 8592-8595, ISSN: 1091-6490
• JOURNAL ARTICLE
Griffié J, Shlomovich L, Williamson DJ, Shannon M, Aaron J, Khuon S, L Burn G, Boelen L, Peters R, Cope AP, Cohen EAK, Rubin-Delanchy P, Owen DM, Griffie J, Shlomovich L, Williamson D, Shannon M, Aarons J, Khuon S, Burn G, Boelen L, Peters R, Cope A, Cohen E, Rubin-Delanchy P, Owen Det al., 2017,

### 3D Bayesian cluster analysis of super-resolution data reveals LAT recruitment to the T cell synapse

, Scientific Reports, Vol: 7, ISSN: 2045-2322
• JOURNAL ARTICLE
Zhang Q, Filippi S, Gretton A, Sejdinovic D, Zhang Q, Filippi S, Gretton A, Sejdinovic Det al., 2017,

### Large-Scale Kernel Methods for Independence Testing

, Statistics and Computing, ISSN: 1573-1375

Representations of probability measures in reproducing kernel Hilbert spacesprovide a flexible framework for fully nonparametric hypothesis tests ofindependence, which can capture any type of departure from independence,including nonlinear associations and multivariate interactions. However, theseapproaches come with an at least quadratic computational cost in the number ofobservations, which can be prohibitive in many applications. Arguably, it isexactly in such large-scale datasets that capturing any type of dependence isof interest, so striking a favourable tradeoff between computational efficiencyand test performance for kernel independence tests would have a direct impacton their applicability in practice. In this contribution, we provide anextensive study of the use of large-scale kernel approximations in the contextof independence testing, contrasting block-based, Nystrom and random Fourierfeature approaches. Through a variety of synthetic data experiments, it isdemonstrated that our novel large scale methods give comparable performancewith existing methods whilst using significantly less computation time andmemory.

• JOURNAL ARTICLE
Zhuang L, Walden AT, Zhuang L, Walden AT, Zhuang L, Walden ATet al., 2017,

### Sample Mean Versus Sample Frechet Mean for Combining Complex Wishart Matrices: A Statistical Study

, IEEE TRANSACTIONS ON SIGNAL PROCESSING, Vol: 65, Pages: 4551-4561, ISSN: 1053-587X

The space of covariance matrices is a non-Euclidean space. The matrices form a manifold which if equipped with a Riemannian metric becomes a Riemannian manifold, and recently this idea has been used for comparison and clustering of complex valued spectral matrices, which at a given frequency are typically modelled as complex Wishart-distributed random matrices. Identically distributed sample complex Wishart matrices can be combined via a standard sample mean to derive a more stable overall estimator. However, using the Riemannian geometry their so-called sample Fr´echet mean can also be found. We derive the expected value of the determinant of the sample Fr´echet mean and the expected value of the sample Fr´echet mean itself. The population Fr´echet mean is shown to be a scaled version of the true covariance matrix. The risk under convex loss functions for the standard sample mean is never larger than for the Fr´echet mean. In simulations the sample mean also performs better for the estimation of an important functional derived from the estimated covariance matrix, namely partial coherence.

• CONFERENCE PAPER
Bakoben M, Adams N, Bellotti A, 2016,

### Uncertainty aware clustering for behaviour in enterprise networks

, 16th IEEE International Conference on Data Mining (ICDM), Publisher: IEEE, Pages: 269-272, ISSN: 2375-9232
• JOURNAL ARTICLE
Bakoben M, Bellotti A, Adams N, Bakoben M, Bellotti A, Adams N, Bakoben M, Bellotti AG, Adams NM, Bakoben M, Bellotti A, Adams NMet al., 2016,

### Improving clustering performance by incorporating uncertainty

, PATTERN RECOGNITION LETTERS, Vol: 77, Pages: 28-34, ISSN: 0167-8655

In more challenging problems the input to a clustering problem is not raw data objects, but rather parametric statistical summaries of the data objects. For example, time series of different lengths may be clustered on the basis of estimated parameters from autoregression models. Such summary procedures usually provide estimates of uncertainty for parameters, and ignoring this source of uncertainty affects the recovery of the true clusters. This paper is concerned with the incorporation of this source of uncertainty in the clustering procedure. A new dissimilarity measure is developed based on geometric overlap of confidence ellipsoids implied by the uncertainty estimates. In extensive simulation studies and a synthetic time series benchmark dataset, this new measure is shown to yield improved performance over standard approaches.

• JOURNAL ARTICLE
Battey H, Feng Q, Smith RJ, Battey H, Feng Q, Smith RJet al., 2016,

### Improving confidence set estimation when parameters are weakly identified

, Statistics and Probability Letters, Vol: 118, Pages: 117-123, ISSN: 0167-7152

© 2016 Elsevier B.V. We consider inference in weakly identified moment condition models when additional partially identifying moment inequality constraints are available. We detail the limiting distribution of the estimation criterion function and consequently propose a confidence set estimator for the true parameter.

• JOURNAL ARTICLE

### A comparison of efficient approximations for a weighted sum of chi-squared random variables

, STATISTICS AND COMPUTING, Vol: 26, Pages: 917-928, ISSN: 0960-3174

In many applications, the cumulative distribution function (cdf) FQNFQN of a positively weighted sum of N i.i.d. chi-squared random variables QNQN is required. Although there is no known closed-form solution for FQNFQN, there are many good approximations. When computational efficiency is not an issue, Imhof’s method provides a good solution. However, when both the accuracy of the approximation and the speed of its computation are a concern, there is no clear preferred choice. Previous comparisons between approximate methods could be considered insufficient. Furthermore, in streaming data applications where the computation needs to be both sequential and efficient, only a few of the available methods may be suitable. Streaming data problems are becoming ubiquitous and provide the motivation for this paper. We develop a framework to enable a much more extensive comparison between approximate methods for computing the cdf of weighted sums of an arbitrary random variable. Utilising this framework, a new and comprehensive analysis of four efficient approximate methods for computing FQNFQN is performed. This analysis procedure is much more thorough and statistically valid than previous approaches described in the literature. A surprising result of this analysis is that the accuracy of these approximate methods increases with N.

• CONFERENCE PAPER

### Predictability of NetFlow data

, 14th IEEE International Conference on Intelligence and Security Informatics - Cybersecurity and Big Data (IEEE ISI), Publisher: IEEE, Pages: 67-72

The behaviour of individual devices connected to anenterprise network can vary dramatically, as a device’s activitydepends on the user operating the device as well as on all behindthe scenes operations between the device and the network. Beingable to understand and predict a device’s behaviour in a networkcan work as the foundation of an anomaly detection framework,as devices may show abnormal activity as part of a cyber attack.The aim of this work is the construction of a predictive regressionmodel for a device’s behaviour at normal state. The behaviourof a device is presented by a quantitative response and modelledto depend on historic data recorded by NetFlow.

• JOURNAL ARTICLE
Filippi S, Holmes C, Filippi S, Holmes CC, Filippi S, Holmes Cet al., 2016,

### A Bayesian nonparametric approach to testing for dependence between random variables

, Bayesian Analysis, ISSN: 1931-6690

Nonparametric and nonlinear measures of statistical dependence between pairsof random variables are important tools in modern data analysis. In particularthe emergence of large data sets can now support the relaxation of linearityassumptions implicit in traditional association scores such as correlation.Here we describe a Bayesian nonparametric procedure that leads to a tractable,explicit and analytic quantification of the relative evidence for dependence vsindependence. Our approach uses Polya tree priors on the space of probabilitymeasures which can then be embedded within a decision theoretic test fordependence. Polya tree priors can accommodate known uncertainty in the form ofthe underlying sampling distribution and provides an explicit posteriorprobability measure of both dependence and independence. Well known advantagesof having an explicit probability measure include: easy comparison of evidenceacross different studies; encoding prior information; quantifying changes independence across different experimental conditions, and; the integration ofresults within formal decision analysis.

• JOURNAL ARTICLE
Filippi S, Holmes CC, Nieto-Barajas LE, Filippi S, Holmes CC, Nieto-Barajas LEet al., 2016,

### Scalable Bayesian nonparametric measures for exploring pairwise dependence via Dirichlet Process Mixtures

, Electronic Journal of Statistics, Vol: 10, Pages: 1807-1828, ISSN: 1935-7524

We present a novel Bayesian nonparametric regression model for covariates XX and continuous response variable Y∈RY∈R. The model is parametrized in terms of marginal distributions for YY and XX and a regression function which tunes the stochastic ordering of the conditional distributions F(y|x)F(y|x). By adopting an approximate composite likelihood approach, we show that the resulting posterior inference can be decoupled for the separate components of the model. This procedure can scale to very large datasets and allows for the use of standard, existing, software from Bayesian nonparametric density estimation and Plackett-Luce ranking estimation to be applied. As an illustration, we show an application of our approach to a US Census dataset, with over 1,300,000 data points and more than 100 covariates.

• CONFERENCE PAPER
Flaxman S, Sejdinovic D, Cunningham JP, Filippi S, Flaxman S, Sejdinovic D, Cunningham JP, Filippi Set al., 2016,

### Bayesian Learning of Kernel Embeddings

, UAI'16

Kernel methods are one of the mainstays of machine learning, but the problemof kernel learning remains challenging, with only a few heuristics and verylittle theory. This is of particular importance in methods based on estimationof kernel mean embeddings of probability measures. For characteristic kernels,which include most commonly used ones, the kernel mean embedding uniquelydetermines its probability measure, so it can be used to design a powerfulstatistical testing framework, which includes nonparametric two-sample andindependence tests. In practice, however, the performance of these tests can bevery sensitive to the choice of kernel and its lengthscale parameters. Toaddress this central issue, we propose a new probabilistic model for kernelmean embeddings, the Bayesian Kernel Embedding model, combining a Gaussianprocess prior over the Reproducing Kernel Hilbert Space containing the meanembedding with a conjugate likelihood function, thus yielding a closed formposterior over the mean embedding. The posterior mean of our model is closelyrelated to recently proposed shrinkage estimators for kernel mean embeddings,while the posterior uncertainty is a new, interesting feature with variouspossible applications. Critically for the purposes of kernel learning, ourmodel gives a simple, closed form marginal pseudolikelihood of the observeddata given the kernel hyperparameters. This marginal pseudolikelihood caneither be optimized to inform the hyperparameter choice or fully Bayesianinference can be used.

• JOURNAL ARTICLE
Lee SMS, Young GA, Lee SMS, Young GA, Lee SMS, Young GA, Young GA, Lee SMSet al., 2016,

### Distribution of likelihood-based p-values under a local alternative hypothesis

, BIOMETRIKA, Vol: 103, Pages: 641-652, ISSN: 0006-3444

© 2016 Biometrika Trust. We consider inference on a scalar parameter of interest in the presence of a nuisance parameter, using a likelihood-based statistic which is asymptotically normally distributed under the null hypothesis. Higher-order expansions are used to compare the repeated sampling distribution, under a general contiguous alternative hypothesis, of p-values calculated from the asymptotic normal approximation to the null sampling distribution of the statistic with the distribution of p-values calculated by bootstrap approximations. The results of comparisons in terms of power of different testing procedures under an alternative hypothesis are closely related to differences under the null hypothesis, specifically the extent to which testing procedures are conservative or liberal under the null. Empirical examples are given which demonstrate that higher-order asymptotic effects may be seen clearly in small-sample contexts.

• JOURNAL ARTICLE
Nieto-Reyes A, Battey H, Nieto-Reyes A, Battey Het al., 2016,

### A Topologically Valid Definition of Depth for Functional Data

, Statistical Science, Vol: 31, Pages: 61-79, ISSN: 0883-4237

The main focus of this work is on providing a formal definition of statistical depth for functional data on the basis of six properties, recognising topological features such as continuity, smoothness and contiguity. Amongst our depth defining properties is one that addresses the delicate challenge of inherent partial observability of functional data, with fulfillment giving rise to a minimal guarantee on the performance of the empirical depth beyond the idealised and practically infeasible case of full observability. As an incidental product, functional depths satisfying our definition achieve a robustness that is commonly ascribed to depth, despite the absence of a formal guarantee in the multivariate definition of depth. We demonstrate the fulfillment or otherwise of our properties for six widely used functional depth proposals, thereby providing a systematic basis for selection of a depth function.

• CONFERENCE PAPER

### Correlation-based Streaming Anomaly Detection in Cyber-Security

, 16th IEEE International Conference on Data Mining (ICDM), Publisher: IEEE, Pages: 311-318, ISSN: 2375-9232
• CONFERENCE PAPER

### Handling Delayed Labels in Temporally Evolving Data Streams

, 4th IEEE International Conference on Big Data (Big Data), Publisher: IEEE, Pages: 2416-2424
• CONFERENCE PAPER
Rubin-Delanchy P, Adams NM, Heard NA, Rubin-Delanchy P, HEARD NAet al., 2016,

### Disassortativity of Computer Networks

, 14th IEEE International Conference on Intelligence and Security Informatics - Cybersecurity and Big Data (IEEE ISI), Publisher: IEEE, Pages: 243-247

Network data is ubiquitous in cyber-security applications. Accurately modelling such data allows discovery of anomalous edges, subgraphs or paths, and is key to many signature-free cyber-security analytics. We present a recurring property of graphs originating from cyber-security applications, often considered a ‘corner case’ in the main literature on network data analysis, that greatly affects the performance of standard ‘off-the-shelf’ techniques. This is the property that similarity, in terms of network behaviour, does not imply connectivity, and in fact the reverse is often true. We call this disassortivity. The phenomenon is illustrated using network flow data collected on an enterprise network, and we show how Big Data analytics designed to detect unusual connectivity patterns can be improved.

• JOURNAL ARTICLE
Schneider-Luftman D, Walden AT, Schneider-Luftman D, Walden AT, Schneider-Luftman D, Walden AT, Schneider-Luftman D, Walden ATet al., 2016,

### Partial Coherence Estimation via Spectral Matrix Shrinkage under Quadratic Loss

, IEEE TRANSACTIONS ON SIGNAL PROCESSING, Vol: 64, Pages: 5767-5777, ISSN: 1053-587X

© 1991-2012 IEEE. Partial coherence is an important quantity derived from spectral or precision matrices and is used in seismology, meteorology, oceanography, neuroscience and elsewhere. If the number of complex degrees of freedom only slightly exceeds the dimension of the multivariate stationary time series, spectral matrices are poorly conditioned and shrinkage techniques suggest themselves. When true partial coherencies are quite large, then for shrinkage estimators of the diagonal weighting kind, it is shown empirically that the minimization of risk using quadratic loss (QL) leads to oracle partial coherence estimators far superior to those derived by minimizing risk using Hilbert-Schmidt (HS) loss. When true partial coherencies are small, the methods behave similarly. We derive two new QL estimators for spectral matrices, and new QL and HS estimators for precision matrices. In addition, for the full estimation (non-oracle) case where certain trace expressions must also be estimated, we examine the behavior of three different QL estimators, with the precision matrix one seeming particularly appealing. For the empirical study, we carry out exact simulations derived from real EEG data for two individuals, one having large, and the other small, partial coherencies. This ensures that our study covers cases of real-world relevance.

• CONFERENCE PAPER
Whitehouse M, Evangelou M, Adams NM, Whitehouse M, Evangelou M, Adams Net al., 2016,

### Activity-based temporal anomaly detection in enterprise-cyber security

, 14th IEEE International Conference on Intelligence and Security Informatics - Cybersecurity and Big Data (IEEE ISI), Publisher: IEEE, Pages: 248-250

Statistical anomaly detection is emerging as animportant complement to signature-based methods for enterprisenetwork defence. In this paper, we isolate a persistent structurein two different enterprise network data sources. This structureprovides the basis of a regression-based anomaly detectionmethod. The procedure is demonstrated on a large public domaindata set.

• JOURNAL ARTICLE
Cohen E, Kim D, Ober RJ, Cohen EAK, Kim D, Ober RJ, Cohen EAK, Kim D, Ober RJ, Cohen EAK, Kim D, Ober RJ, Cohen EAK, Kim D, Ober RJet al., 2015,

### The Cramer Rao lower bound for point based image registration with heteroscedastic error model for application in single molecule microscopy

, IEEE Transactions on Medical Imaging, Vol: 34, Pages: 2632-2644, ISSN: 1558-254X

The Cramér-Rao lower bound for the estimation of the affine transformation parameters in a multivariate heteroscedastic errors-in-variables model is derived. The model is suitable for feature-based image registration in which both sets of control points are localized with errors whose covariance matrices vary from point to point. With focus given to the registration of fluorescence microscopy images, the Cramér-Rao lower bound for the estimation of a feature's position (e.g., of a single molecule) in a registered image is also derived. In the particular case where all covariance matrices for the localization errors are scalar multiples of a common positive definite matrix (e.g., the identity matrix), as can be assumed in fluorescence microscopy, then simplified expressions for the Cramér-Rao lower bound are given. Under certain simplifying assumptions these expressions are shown to match asymptotic distributions for a previously presented set of estimators. Theoretical results are verified with simulations and experimental data.

• JOURNAL ARTICLE
DiCiccio TJ, Kuffner TA, Young GA, DiCiccio TJ, Kuffner TA, Young GA, DiCiccio TJ, Kuffner TA, Young GAet al., 2015,

### Quantifying nuisance parameter effects via decompositions of asymptotic refinements for likelihood-based statistics

, Journal of Statistical Planning and Inference, Vol: 165, Pages: 1-12, ISSN: 1873-1171

Accurate inference on a scalar interest parameter in the presence of a nuisance parameter may be obtained using an adjusted version of the signed root likelihood ratio statistic, in particular Barndorff-Nielsen’s R∗ statistic. The adjustment made by this statistic may be decomposed into a sum of two terms, interpreted as correcting respectively for the possible effect of nuisance parameters and the deviation from standard normality of the signed root likelihood ratio statistic itself. We show that the adjustment terms are determined to second-order in the sample size by their means. Explicit expressions are obtained for the leading terms in asymptotic expansions of these means. These are easily calculated, allowing a simple way of quantifying and interpreting the respective effects of the two adjustments, in particular of the effect of a high dimensional nuisance parameter. Illustrations are given for a number of examples, which provide theoretical insight to the effect of nuisance parameters on parametric inference. The analysis provides a decomposition of the mean of the signed root statistic involving two terms: the first has the property of taking the same value whether there are no nuisance parameters or whether there is an orthogonal nuisance parameter, while the second is zero when there are no nuisance parameters. Similar decompositions are discussed for the Bartlett correction factor of the likelihood ratio statistic, and for other asymptotically standard normal pivots.

• JOURNAL ARTICLE
DiCiccio TJ, Kuffner TA, Young GA, Zaretzki R, J DiCiccio T, A Kuffner T, Young GA, Zaretzki R, DiCiccio TJ, Kuffner TA, Young GA, Zaretzki Ret al., 2015,

### STABILITY AND UNIQUENESS OF p-VALUES FOR LIKELIHOOD-BASED INFERENCE

, STATISTICA SINICA, Vol: 25, Pages: 1355-1376, ISSN: 1017-0405

Likelihood-based methods of statistical inference provide a useful general methodology that is appealing, as a straightforward asymptotic theory can be applied for their implementation. It is important to assess the relationships between different likelihood-based inferential procedures in terms of accuracy and adherence to key principles of statistical inference, in particular those relating to conditioning on relevant ancillary statistics. An analysis is given of the stability properties of a general class of likelihood-based statistics, including those derived from forms of adjusted profile likelihood, and comparisons are made between inferences derived from different statistics. In particular, we derive a set of sufficient conditions for agreement to Op(n-1), in terms of the sample size n, of inferences, specifically p-values, derived from different asymptotically standard normal pivots. Our analysis includes inference problems concerning a scalar or vector interest parameter, in the presence of a nuisance parameter.

• JOURNAL ARTICLE
Ehrlich E, Jasra A, Kantas N, Ehrlich E, Jasra A, Kantas N, Ehrlich E, Jasra A, Kantas N, Ehrlich E, Jasra A, Kantas Net al., 2015,

### Gradient Free Parameter Estimation for Hidden Markov Models with Intractable Likelihoods

, METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, Vol: 17, Pages: 315-349, ISSN: 1387-5841

In this article we focus on Maximum Likelihood estimation (MLE) for thestatic parameters of hidden Markov models (HMMs). We will consider the casewhere one cannot or does not want to compute the conditional likelihood densityof the observation given the hidden state because of increased computationalcomplexity or analytical intractability. Instead we will assume that one mayobtain samples from this conditional likelihood and hence use approximateBayesian computation (ABC) approximations of the original HMM. ABCapproximations are biased, but the bias can be controlled to arbitraryprecision via a parameter \epsilon>0; the bias typically goes to zero as\epsilon \searrow 0. We first establish that the bias in the log-likelihood andgradient of the log-likelihood of the ABC approximation, for a fixed batch ofdata, is no worse than \mathcal{O}(n\epsilon), n being the number of data;hence, for computational reasons, one might expect reasonable parameterestimates using such an ABC approximation. Turning to the computational problemof estimating $\theta$, we propose, using the ABC-sequential Monte Carlo (SMC)algorithm in Jasra et al. (2012), an approach based upon simultaneousperturbation stochastic approximation (SPSA). Our method is investigated on twonumerical examples

• JOURNAL ARTICLE
Ruan D, Young A, Montana G, Ruan D, Young A, Montana G, Ruan D, Young A, Montana G, Ruan D, Young A, Montana G, Ruan D, Young A, Montana G, Young GA, Montana G, Ruan Det al., 2015,

### Differential analysis of biological networks

, BMC BIOINFORMATICS, Vol: 16, ISSN: 1471-2105

BACKGROUND: In cancer research, the comparison of gene expression or DNA methylation networks inferred from healthy controls and patients can lead to the discovery of biological pathways associated to the disease. As a cancer progresses, its signalling and control networks are subject to some degree of localised re-wiring. Being able to detect disrupted interaction patterns induced by the presence or progression of the disease can lead to the discovery of novel molecular diagnostic and prognostic signatures. Currently there is a lack of scalable statistical procedures for two-network comparisons aimed at detecting localised topological differences. RESULTS: We propose the dGHD algorithm, a methodology for detecting differential interaction patterns in two-network comparisons. The algorithm relies on a statistic, the Generalised Hamming Distance (GHD), for assessing the degree of topological difference between networks and evaluating its statistical significance. dGHD builds on a non-parametric permutation testing framework but achieves computationally efficiency through an asymptotic normal approximation. CONCLUSIONS: We show that the GHD is able to detect more subtle topological differences compared to a standard Hamming distance between networks. This results in the dGHD algorithm achieving high performance in simulation studies as measured by sensitivity and specificity. An application to the problem of detecting differential DNA co-methylation subnetworks associated to ovarian cancer demonstrates the potential benefits of the proposed methodology for discovering network-derived biomarkers associated with a trait of interest.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-t4-html.jsp Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=218&limit=30&respub-action=search.html Current Millis: 1503497008235 Current Time: Wed Aug 23 15:03:28 BST 2017