## Publications

39 results found

Price-Williams M, Heard N, 2019, Nonparametric self-exciting models for computer network traffic, *Statistics and Computing*, Pages: 1-12, ISSN: 0960-3174

Connectivity patterns between nodes in a computer network can be interpreted and modelled as point processes where events in a process indicate connections being established for data to be sent along that edge. A model of normal connectivity behaviour can be constructed for each edge in a network by identifying key network user features such as seasonality or self-exciting behaviour, since events typically arise in bursts at particular times of day which may be peculiar to that edge. When monitoring a computer network in real time, unusual patterns of activity against the model of normality could indicate the presence of a malicious actor. A flexible, novel, nonparametric model for the excitation function of a Wold process is proposed for modelling the conditional intensities of network edges. This approach is shown to outperform standard seasonality and self-excitation models in predicting network connections, achieving well-calibrated predictions for event data collected from the computer networks of both Imperial College and Los Alamos National Laboratory.

Price-Williams M, Heard N, Rubin-Delanchy P, 2018, Detecting weak dependence in computer network traffic patterns by using higher criticism, *Journal of the Royal Statistical Society: Series C*, ISSN: 0035-9254

To perform robust statistical anomaly detection in cybersecurity, we must build realistic models of the traffic patterns within a computer network. It is therefore important to understand the dependences between the large number of routinely interacting communication pathways within such a network. Pairs of interacting nodes in any directed communication network can be modelled as point processes where events in a process indicate information being sent between two nodes. For two processes A and B denoting the interactions between two distinct pairs of computers, called edges, we wish to assess whether events in A trigger events then to occur in B. A test is introduced to detect such dependence when only a subset of the events in A exhibit a triggering effect on process B; this test will enable us to detect even weakly correlated edges within a computer network graph. Since computer network events occur as a high frequency data stream, we consider the asymptotics of this problem as the number of events goes to ∞, while the proportion exhibiting dependence goes to 0, and examine the performance of tests that are provably consistent in this framework. An example of how this method can be used to detect genuine causal dependences is provided by using real world event data from the enterprise computer network of Los Alamos National Laboratory.

Rubin-Delanchy P, Heard NA, Lawson D, 2018, Meta-analysis of mid-p-values: some new results based on the convex order, *Journal of the American Statistical Association*, ISSN: 0162-1459

The mid-p-value is a proposed improvement on the ordinary p-value for the case where the test statistic is partially or completely discrete. In this case, the ordinary p-value is conservative, meaning that its null distribution is larger than a uniform distribution on the unit interval, in the usual stochastic order. The mid-p-value is not conservative. However, its null distribution is dominated by the uniform distribution in a different stochastic order, called the convex order. The property leads us to discover some new finite-sample and asymptotic bounds on functions of mid-p-values, which can be used to combine results from different hypothesis tests conservatively, yet more powerfully, using mid-p-values rather than p-values. Our methodology is demonstrated on real data from a cyber-security application.

Heard N, Rubin-Delanchy P, 2018, Choosing between methods of combining p-values, *Biometrika*, Vol: 105, Pages: 239-246, ISSN: 0006-3444

Combining p-values from independent statistical tests is a popular approachto meta-analysis, particularly when the data underlying the tests are either nolonger available or are difficult to combine. A diverse range of p-valuecombination methods appear in the literature, each with different statisticalproperties. Yet all too often the final choice used in a meta-analysis canappear arbitrary, as if all effort has been expended building the models thatgave rise to the p-values. Birnbaum (1954) showed that any reasonable p-valuecombiner must be optimal against some alternative hypothesis. Starting fromthis perspective and recasting each method of combining p-values as alikelihood ratio test, we present theoretical results for some of the standardcombiners which provide guidance about how a powerful combiner might be chosenin practice.

Bolton A, Heard N, 2018, Malware Family Discovery Using Reversible Jump MCMC Sampling of Regimes, *Journal of the American Statistical Association*, ISSN: 0162-1459

Malware is computer software which has either been designed or modified with malicious intent. Hundreds of thousands of new malware threats appear on the internet each day. This is made possible through reuse of known exploits in computer systems which have not been fully eradicated; existing pieces of malware can be trivially modified and combined to create new malware which is unknown to anti-virus programs. Finding new software with similarities to known malware is therefore an important goal in cyber-security. A dynamic instruction trace of a piece of software is the sequence of machine language instructions it generates when executed. Statistical analysis of a dynamic instruction trace can help reverse engineers infer the purpose and origin of the software that generated it. Instruction traces have been successfully modeled as simple Markov chains, but empirically there are change points in the structure of the traces, with recurring regimes of transition patterns. Here, reversible jump MCMC for change point detection is extended to incorporate regime-switching, allowing regimes to be inferred from malware instruction traces. A similarity measure for malware programs based on regime matching is then used to infer the originating families, leading to compelling performance results.

Price-Williams M, Heard N, Turcotte M, 2017, Detecting periodic subsequences in cyber security data, Pages: 84-90

© 2017 IEEE. Anomaly detection for cyber-security defence hasgarnered much attention in recent years providing an orthogonalapproach to traditional signature-based detection systems.Anomaly detection relies on building probability models ofnormal computer network behaviour and detecting deviationsfrom the model. Most data sets used for cyber-security havea mix of user-driven events and automated network events,which most often appears as polling behaviour. Separating theseautomated events from those caused by human activity is essentialto building good statistical models for anomaly detection. This articlepresents a changepoint detection framework for identifyingautomated network events appearing as periodic subsequences ofevent times. The opening event of each subsequence is interpretedas a human action which then generates an automated, periodicprocess. Difficulties arising from the presence of duplicate andmissing data are addressed. The methodology is demonstrated usingauthentication data from Los Alamos National Laboratory'senterprise computer network.

Turcotte M, Moore J, Heard NA, et al., 2016, Poisson factorization for peer-based anomaly detection, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

Anomaly detection systems are a promising tool to identify compromised user credentials and malicious insiders in enterprise networks. Most existing approaches for modelling user behaviour rely on either independent observations for each user or on pre-defined user peer groups. A method is proposed based on recommender system algorithms to learn overlapping user peer groups and to use this learned structure to detect anomalous activity. Results analysing the authentication and process-running activities of thousands of users show that the proposed method can detect compromised user accounts during a red team exercise.

Rubin-Delanchy P, HEARD NA, 2016, Disassortativity of computer networks, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

Network data is ubiquitous in cyber-security applications. Accurately modelling such data allows discovery of anomalous edges, subgraphs or paths, and is key to many signature-free cyber-security analytics. We present a recurring property of graphs originating from cyber-security applications, often considered a ‘corner case’ in the main literature on network data analysis, that greatly affects the performance of standard ‘off-the-shelf’ techniques. This is the property that similarity, in terms of network behaviour, does not imply connectivity, and in fact the reverse is often true. We call this disassortivity. The phenomenon is illustrated using network flow data collected on an enterprise network, and we show how Big Data analytics designed to detect unusual connectivity patterns can be improved.

Metelli S, Heard NA, 2016, Model-based clustering and new edge modelling in large computer networks, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

Computer networks are complex and the analysis of their structure in search for anomalous behaviour is both a challenging and important task for cyber security. For instance, new edges, i.e. connections from a host or user to a computer that has not been connected to before, provide potentially strong statistical evidence for detecting anomalies. Unusual new edges can sometimes be indicative of both legitimate activity, such as automated update requests permitted by the client, and illegitimate activity, such as denial of service (DoS) attacks to cause service disruption or intruders escalating privileges by traversing through the host network. In both cases, capturing and accumulating evidence of anomalous new edge formation represents an important security application. Computer networks tend to exhibit an underlying cluster structure, where nodes are naturally grouped together based on similar connection patterns. What constitutes anomalous behaviour may strongly differ between clusters, so inferring these peer groups constitutes an important step in modelling the types of new connections a user would make. In this article, we present a two-step Bayesian statistical method aimed at clustering similar users inside the network and simultaneously modelling new edge activity, exploiting both overall-level and cluster-level covariates.

Heard NA, Rubin-Delanchy P, 2016, Network-wide anomaly detection via the Dirichlet process, IEEE Conference on Intelligence and Security Informatics (ISI), 2016, Publisher: IEEE

Statistical anomaly detection techniques provide the next layer of cyber-security defences below traditional signature-based approaches. This article presents a scalable, principled, probability-based technique for detecting outlying connectivity behaviour within a directed interaction network such as a computer network. Independent Bayesian statistical models are fit to each message recipient in the network using the Dirichlet process, which provides a tractable, conjugate prior distribution for an unknown discrete probability distribution. The method is shown to successfully detect a red team attack in authentication data obtained from the enterprise network of Los Alamos National Laboratory.

Heard NA, Palla K, Skoularidou M, 2016, Topic modelling of authentication events in an enterprise computer network, IEEE Conference on Intelligence and Security Informatics (ISI), 2016, Publisher: IEEE

The possibility for theft or misuse of legitimate user credentials is a potential cyber-security weakness in any enterprise computer network which is almost impossible to eradicate. However, by monitoring the network traffic patterns, it can be possible to detect misuse of credentials. This article presents an initial investigation into deconvolving the mixture behaviour of several individuals within a network, to see if individual users can be identified. Towards that, a technique used for document classification is deployed, the Latent Dirichlet allocation model. A pilot study is conducted on authentication events taken from real data from the enterprise network of Los Alamos National Laboratory.

GriffiÃ© J, Shannon M, Bromley CL,
et al., 2016, A Bayesian cluster analysis method for single-molecule localization microscopy data, *Nature Protocols*, Vol: 11, Pages: 2499-2514, ISSN: 1754-2189

Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.

Heard NA, Turcotte MJM, 2016, Adaptive sequential Monte Carlo for multiple changepoint analysis, *Journal of Computational and Graphical Statistics*, Vol: 26, Pages: 414-423, ISSN: 1537-2715

Process monitoring and control requires detection of structural changes in a data stream in real time. This article introduces an efficient sequential Monte Carlo algorithm designed for learning unknown changepoints in continuous time. The method is intuitively simple: new changepoints for the latest window of data are proposed by conditioning only on data observed since the most recent estimated changepoint, as these observations carry most of the information about the current state of the process. The proposed method shows improved performance over the current state of the art. Another advantage of the proposed algorithm is that it can be made adaptive, varying the number of particles according to the apparent local complexity of the target changepoint probability distribution. This saves valuable computing time when changes in the changepoint distribution are negligible, and enables re-balancing of the importance weights of existing particles when a significant change in the target distribution is encountered. The plain and adaptive versions of the method are illustrated using the canonical continuous time changepoint problem of inferring the intensity of an inhomogeneous Poisson process, although the method is generally applicable to any changepoint problem. Performance is demonstrated using both conjugate and non-conjugate Bayesian models for the intensity. Appendices to the article are available online, illustrating the method on other models and applications.

Adams N, Heard N, 2016, Dynamic networks and cyber-security, ISBN: 9781786340740

© 2016 by World Scientific Publishing Europe Ltd. All rights reserved. As an under-studied area of academic research, the analysis of computer network traffic data is still in its infancy. However, the challenge of detecting and mitigating malicious or unauthorised behaviour through the lens of such data is becoming an increasingly prominent issue. This collection of papers by leading researchers and practitioners synthesises cutting-edge work in the analysis of dynamic networks and statistical aspects of cyber security. The book is structured in such a way as to keep security application at the forefront of discussions. It offers readers easy access into the area of data analysis for complex cyber-security applications, with a particular focus on temporal and network aspects. Chapters can be read as standalone sections and provide rich reviews of the latest research within the field of cyber-security. Academic readers will benefit from state-of-the-art descriptions of new methodologies and their extension to real practical problems while industry professionals will appreciate access to more advanced methodology than ever before.

Rubin-Delanchy P, Lawson DJ, Heard NA, 2016, Anomaly detection for cyber security applications, Dynamic Networks and Cyber-Security, Pages: 137-156, ISBN: 9781786340740

© 2016 by World Scientific Publishing Europe Ltd. All rights reserved. In this chapter, we outline a general modus operandi under which to perform intrusion detection at scale. The over-arching principle is this: A network monitoring tool has access to large stores of data on which it can learn 'normal' network behaviour. On the other hand, data on intrusions are relatively rare. This imbalance invites us to frame intrusion detection as an anomaly detection problem where, under the null hypothesis that there is no intrusion, the data follow a machine-learnt model of behaviour, and, under the alternative that there is some form of intrusion, certain anomalies in that model will be apparent. This approach to cyber security poses some important statistical challenges. One is simply modelling and doing inference with such large-scale and heterogeneous data. Another is performing anomaly detection when the null hypothesis comprises a complex model. Finally, a key problem is combining different anomalies through time and across the network.

Turcotte MJM, Heard NA, Kent AD, 2016, Modelling user behaviour in a network using computer event logs, Dynamic Networks and Cyber-Security, Pages: 67-87, ISBN: 9781786340740

© 2016 by World Scientific Publishing Europe Ltd. All rights reserved. Computer event logs are a potentially valuable resource in detecting cyber security threats on a computer network. One important research problem associated with these logs is user credential theft or misuse, either by a malicious insider or an external adversary. Once compromised, a user credential can be used by an adversary to advance through the network and further their goals. Little attention is currently given to looking at computer event logs as an aggregated multivariate data stream. The aim of the work in this chapter is to model user credential patterns on the network by considering independently the time series of events generated by each user credential. Simple Bayesian models are fit to the event data for each user credential, providing a flexible global framework for monitoring credentials on an enterprise network and identifying potentially compromised credentials.

Rubin-Delanchy P, Burn GL, GriffiÃ© J,
et al., 2015, Bayesian cluster identification in single-molecule localization microscopy data., *Nature Methods*, Vol: 12, Pages: 1072-1076, ISSN: 1548-7105

Single-molecule localization-based super-resolution microscopy techniques such as photoactivated localization microscopy (PALM) and stochastic optical reconstruction microscopy (STORM) produce pointillist data sets of molecular coordinates. Although many algorithms exist for the identification and localization of molecules from raw image data, methods for analyzing the resulting point patterns for properties such as clustering have remained relatively under-studied. Here we present a model-based Bayesian approach to evaluate molecular cluster assignment proposals, generated in this study by analysis based on Ripley's K function. The method takes full account of the individual localization precisions calculated for each emitter. We validate the approach using simulated data, as well as experimental data on the clustering behavior of CD3ζ, a subunit of the CD3 T cell receptor complex, in resting and activated primary human T cells.

Heard NA, Turcotte MJM, 2015, Convergence of Monte Carlo distribution estimates from rival samplers, *Statistics and Computing*, Vol: 26, Pages: 1147-1161, ISSN: 1573-1375

It is often necessary to make sampling-based statistical inference about many probability distributions in parallel. Given a finite computational resource, this article addresses how to optimally divide sampling effort between the samplers of the different distributions. Formally approaching this decision problem requires both the specification of an error criterion to assess how well each group of samples represent their underlying distribution, and a loss function to combine the errors into an overall performance score. For the first part, a new Monte Carlo divergence error criterion based on Jensen–Shannon divergence is proposed. Using results from information theory, approximations are derived for estimating this criterion for each target based on a single run, enabling adaptive sample size choices to be made during sampling.

Metelli S, Heard N, 2014, Modelling new edge formation in a computer network through Bayesian variable selection, IEEE Joint Intelligence and Security Informatics Conference 2014, Publisher: IEEE

Anomalous connections in a computer network graph can be a signal of malicious behaviours. For instance, a compromised computer node tends to form a large number of new client edges in the network graph, connecting to server IP (Internet Protocol) addresses which have not previously been visited. This behaviour can be caused by malware (malicious software) performing a denial of service (DoS) attack, to cause disruption or further spread malware, alternatively, the rapid formation of new edges by a compromised node can be caused by an intruder seeking to escalate privileges by traversing through the host network. However, study of computer network flow data suggests new edges are also regularly formed by uninfected hosts, and often in bursts. Statistically detecting anomalous formation of new edges requires reliable models of the normal rate of new edges formed by each host. Network traffic data are complex, and so the potential number of variables which might be included in such a statistical model can be large, and without proper treatment this would lead to overfitting of models with poor predictive performance. In this paper, Bayesian variable selection is applied to a logistic regression model for new edge formation for the purpose of selecting the best subset of variables to include.

Heard N, Rubin-Delanchy P, Lawson D, 2014, Filtering automated polling traffic in computer network flow data, IEEE Joint Intelligence and Security Informatics Conference (JISIC 2014), Publisher: IEEE, Pages: 268-271

Detecting polling behaviour in a computer network has two important applications. First, the polling can be indicative of malware beaconing, where an undetected software virus sends regular communications to a controller. Second, the cause of the polling may not be malicious, since it may correspond to regular automated update requests permitted by the client, to build models of normal host behaviour for signature-free anomaly detection, this polling behaviour needs to be understood. This article presents a simple Fourier analysis technique for identifying regular polling, and focuses on the second application: modelling the normal behaviour of a host, using real data collected from the computer network of Imperial College London.

Rubin-Delanchy P, Lawson DJ, Turcotte MJ, et al., 2014, Three statistical approaches to sessionizing network flow data, IEEE Joint Intelligence and Security Informatics Conference (JISIC 2014), Publisher: IEEE, Pages: 244-247

- Author Web Link
- Cite
- Citations: 1

Bolton A, Heard N, 2014, Application of a linear time method for change point detection to the classification of software, Pages: 292-295

© 2014 IEEE. A computer program's dynamic instruction trace is the sequence of instructions it generates during run-time. This article presents a method for analysing dynamic instruction traces, with an application in malware detection. Instruction traces can be modelled as piecewise homogeneous Markov chains and an exact linear time method is used for detecting change points in the transition probability matrix. The change points divide the instruction trace into segments performing different functions. If segments performing malicious functions can be detected then the software can be classified as malicious. The change point detection method is applied to both a simulated dynamic instruction trace and the dynamic instruction trace generated by a piece of malware.

Lawson DJ, Rubin-Delanchy P, Heard N, et al., 2014, Statistical frameworks for detecting tunnelling in cyber defence using big data, IEEE Joint Intelligence and Security Informatics Conference (JISIC 2014), Publisher: IEEE, Pages: 248-251

- Author Web Link
- Cite
- Citations: 1

Heard NA, Turcotte MJ, 2014, Monitoring a device in a communication network, Data Analysis for Network Cyber-Security, Pages: 151-188, ISBN: 9781783263745

© 2014 by Imperial College Press. Anomalous connectivity levels in a communication graph can be indicative of prohibited or malicious behaviour. Detecting anomalies in large graphs, such as telecommunication networks or corporate computer networks, requires techniques which are computationally fast and ideally parallelisable, and this puts a limit on the level of sophistication which can be used in modelling the entire graph. Here, methods are presented for detecting locally anomalous substructures based on simple node and edge-based statistical models. This can be viewed as an initial screening stage for identifying candidate anomalies, which could then be investigated with more sophisticated tools. The focus is on monitoring diverse features of the same data stream emanating from a single communicating device within the network, using conditionally independent probability models. Whilst all of the models considered are purposefully very simple, their practical implementation touches on a diverse range of topics, including conjugate Bayesian inference, reversible jump Markov chain Monte Carlo, sequential Monte Carlo, Markov jump processes, Markov chains, density estimation, changepoint analysis, discrete p-values and control charts.

Adams N, Heard N, 2014, Data analysis for network cyber-security, ISBN: 9781783263745

© 2014 by Imperial College Press. There is increasing pressure to protect computer networks against unauthorized intrusion, and some work in this area is concerned with engineering systems that are robust to attack. However, no system can be made invulnerable. Data Analysis for Network Cyber-Security focuses on monitoring and analyzing network traffic data, with the intention of preventing, or quickly identifying, malicious activity. Such work involves the intersection of statistics, data mining and computer science. Fundamentally, network traffic is relational, embodying a link between devices. As such, graph analysis approaches are a natural candidate. However, such methods do not scale well to the demands of real problems, and the critical aspect of the timing of communications events is not accounted for in these approaches. This book gathers papers from leading researchers to provide both background to the problems and a description of cutting-edge methodology. The contributors are from diverse institutions and areas of expertise and were brought together at a workshop held at the University of Bristol in March 2013 to address the issues of network cyber security. The workshop was supported by the Heilbronn Institute for Mathematical Research.

Turcotte M, Heard N, Neil J, 2014, Detecting Localised Anomalous Behaviour in a Computer Network, 13th International Symposium on Intelligent Data Analysis (IDA), Publisher: SPRINGER INT PUBLISHING AG, Pages: 321-332, ISSN: 0302-9743

- Author Web Link
- Cite
- Citations: 4

Fowler A, Menon V, Heard NA, 2013, DYNAMIC BAYESIAN CLUSTERING, *JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY*, Vol: 11, ISSN: 0219-7200

- Author Web Link
- Cite
- Citations: 3

Fowler A, Heard NA, 2013, Dynamic Bayesian clustering of gene expression data, Pages: 165-170

Clusters of time series data may change location and memberships over time; in gene expression data, this occurs as groups of genes or samples respond differently to stimuli or experimental conditions at different times. In order to uncover this underlying temporal structure, we consider dynamic clusters which not only change location but also split and merge over time, enabling cluster memberships to change. Dynamic clustering is applied to both cyclic and developmental gene expression data sets and reveals interesting, time-dependent structures which could not be identified using traditional clustering methods.

Fowler A, Heard NA, 2012, On two-way Bayesian agglomerative clustering of gene expression data, *Statistical Analy Data Mining*, Vol: 5, Pages: 463-476, ISSN: 1932-1864

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.