## Publications

45 results found

Sanna Passino F, Heard N, Bayesian estimation of the latent dimension and communities in stochastic blockmodels, *Statistics and Computing*, ISSN: 0960-3174

Spectral embedding of adjacency or Laplacian matrices of undirected graphs isa common technique for representing a network in a lower dimensional latentspace, with optimal theoretical guarantees. The embedding can be used toestimate the community structure of the network, with strong consistencyresults in the stochastic blockmodel framework. One of the main practicallimitations of standard algorithms for community detection from spectralembeddings is that the number of communities and the latent dimension of theembedding must be specified in advance. In this article, a novel Bayesian modelfor simultaneous and automatic selection of the appropriate dimension of thelatent space and the number of blocks is proposed. Extensions to directed andbipartite graphs are discussed. The model is tested on simulated and real worldnetwork data, showing promising performance for recovering latent communitystructure.

Sanna Passino F, Heard NA, 2020, Classification of periodic arrivals in event time data for filtering computer network traffic, *Statistics and Computing*, ISSN: 0960-3174

Periodic patterns can often be observed in real-world event time data, possibly mixed with non-periodic arrival times. For modelling purposes, it is necessary to correctly distinguish the two types of events. This task has particularly important implications in computer network security; there, separating automated polling traffic and human-generated activity in a computer network is important for building realistic statistical models for normal activity, which in turn can be used for anomaly detection. Since automated events commonly occur at a fixed periodicity, statistical tests using Fourier analysis can efficiently detect whether the arrival times present an automated component. In this article, sequences of arrival times which contain automated events are further examined, to separate polling and non-periodic activity. This is first achieved using a simple mixture model on the unit circle based on the angular positions of each event time on the p-clock, where p represents the main periodicity associated with the automated activity; this model is then extended by combining a second source of information, the time of day of each event. Efficient implementations exploiting conjugate Bayesian models are discussed, and performance is assessed on real network flow data collected at Imperial College London.

Price-Williams M, Heard N, 2020, Nonparametric self-exciting models for computer network traffic, *Statistics and Computing*, Vol: 30, Pages: 209-220, ISSN: 0960-3174

Connectivity patterns between nodes in a computer network can be interpreted and modelled as point processes where events in a process indicate connections being established for data to be sent along that edge. A model of normal connectivity behaviour can be constructed for each edge in a network by identifying key network user features such as seasonality or self-exciting behaviour, since events typically arise in bursts at particular times of day which may be peculiar to that edge. When monitoring a computer network in real time, unusual patterns of activity against the model of normality could indicate the presence of a malicious actor. A flexible, novel, nonparametric model for the excitation function of a Wold process is proposed for modelling the conditional intensities of network edges. This approach is shown to outperform standard seasonality and self-excitation models in predicting network connections, achieving well-calibrated predictions for event data collected from the computer networks of both Imperial College and Los Alamos National Laboratory.

Sanna Passino F, Turcotte MJM, Heard NA, 2020, Graph link prediction in computer networks using Poisson matrix factorisation

Graph link prediction is an important task in cyber-security: relationshipsbetween entities within a computer network, such as users interacting withcomputers, or system libraries and the corresponding processes that use them,can provide key insights into adversary behaviour. Poisson matrix factorisation(PMF) is a popular model for link prediction in large networks, particularlyuseful for its scalability. In this article, PMF is extended to includescenarios that are commonly encountered in cyber-security applications.Specifically, an extension is proposed to explicitly handle binary adjacencymatrices and include known covariates associated with the graph nodes. Aseasonal PMF model is also presented to handle dynamic networks. To allow themethods to scale to large graphs, variational methods are discussed forperforming fast inference. The results show an improved performance over thestandard PMF model and other common link prediction techniques.

Metelli S, Heard N, 2019, On Bayesian new edge prediction and anomaly detection in computer networks, *Annals of Applied Statistics*, Vol: 13, Pages: 2586-2610, ISSN: 1932-6157

Monitoring computer network traffic for anomalous behaviourpresents an important security challenge. Arrivals of new edges in anetwork graph represent connections between a client and server pairnot previously observed, and in rare cases these might suggest thepresence of intruders or malicious implants. We propose a Bayesianmodel and anomaly detection method for simultaneously characterising existing network structure and modelling likely new edge formation. The method is demonstrated on real computer network authentication data and successfully identifies some machines which areknown to be compromised.

Sanna Passino F, Heard NA, 2019, Modelling dynamic network evolution as a Pitman-Yor process, *Foundations of Data Science*, Vol: 1, Pages: 293-306, ISSN: 2639-8001

Dynamic interaction networks frequently arise in biology, communications technology and the social sciences, representing, for example, neuronal connectivity in the brain, internet connections between computers and human interactions within social networks. The evolution and strengthening of the links in such networks can be observed through sequences of connection events occurring between network nodes over time. In some of these applications, the identity and size of the network may be unknown a priori and may change over time. In this article, a model for the evolution of dynamic networks based on the Pitman-Yor process is proposed. This model explicitly admits power-laws in the number of connections on each edge, often present in real world networks, and, for careful choices of the parameters, power-laws for the degree distribution of the nodes. A novel empirical method for the estimation of the hyperparameters of the Pitman-Yor process is proposed, and some necessary corrections for uniform discrete base distributions are carefully addressed. The methodology is tested on synthetic data and in an anomaly detection study on the enterprise computer network of the Los Alamos National Laboratory, and successfully detects connections from a red-team penetration test.

Price-Williams M, Heard N, Rubin-Delanchy P, 2019, Detecting weak dependence in computer network traffic patterns by using higher criticism, *Journal of the Royal Statistical Society: Series C*, Vol: 68, Pages: 641-655, ISSN: 0035-9254

To perform robust statistical anomaly detection in cybersecurity, we must build realistic models of the traffic patterns within a computer network. It is therefore important to understand the dependences between the large number of routinely interacting communication pathways within such a network. Pairs of interacting nodes in any directed communication network can be modelled as point processes where events in a process indicate information being sent between two nodes. For two processes A and B denoting the interactions between two distinct pairs of computers, called edges, we wish to assess whether events in A trigger events then to occur in B. A test is introduced to detect such dependence when only a subset of the events in A exhibit a triggering effect on process B; this test will enable us to detect even weakly correlated edges within a computer network graph. Since computer network events occur as a high frequency data stream, we consider the asymptotics of this problem as the number of events goes to ∞, while the proportion exhibiting dependence goes to 0, and examine the performance of tests that are provably consistent in this framework. An example of how this method can be used to detect genuine causal dependences is provided by using real world event data from the enterprise computer network of Los Alamos National Laboratory.

Price-Williams M, Turcotte M, Heard N, 2018, Time of Day Anomaly Detection, Pages: 1-6

© 2018 IEEE. Anomaly detection systems have been shown to perform well in detecting compromised user credentials within an enterprise computer network. Most existing approaches have focused on modelling activities that users perform within the network but not the time at which users are active. This article presents an approach for identifying compromised user credentials based on modelling their time of day or diurnal patterns. Anomalous behaviour in this respect would correspond to a user working during hours that deviate from their normal historical behaviour. The methodology is demonstrated using authentication data from Los Alamos National Laboratory's enterprise computer network.

Heard N, Adams N, Rubin-Delanchy P, 2018, Data Science for Cyber-Security, Publisher: Wspc (Europe), ISBN: 9781786345639

This book provides insight into a range of data science techniques for addressing these pressing concerns.The application of statistical and broader data science techniques provides an exciting growth area in the design of cyber defences.

Rubin-Delanchy P, Heard NA, Lawson D, Meta-analysis of mid-p-values: some new results based on the convex order, *Journal of the American Statistical Association*, Vol: 114, Pages: 1105-1112, ISSN: 0162-1459

The mid-p-value is a proposed improvement on the ordinary p-value for the case where the test statistic is partially or completely discrete. In this case, the ordinary p-value is conservative, meaning that its null distribution is larger than a uniform distribution on the unit interval, in the usual stochastic order. The mid-p-value is not conservative. However, its null distribution is dominated by the uniform distribution in a different stochastic order, called the convex order. The property leads us to discover some new finite-sample and asymptotic bounds on functions of mid-p-values, which can be used to combine results from different hypothesis tests conservatively, yet more powerfully, using mid-p-values rather than p-values. Our methodology is demonstrated on real data from a cyber-security application.

Bolton A, Heard N, 2018, Malware family discovery using reversible jump MCMC sampling of regimes, *Journal of the American Statistical Association*, Vol: 113, Pages: 1490-1502, ISSN: 0162-1459

Malware is computer software which has either been designed or modified with malicious intent. Hundreds of thousands of new malware threats appear on the internet each day. This is made possible through reuse of known exploits in computer systems which have not been fully eradicated; existing pieces of malware can be trivially modified and combined to create new malware which is unknown to anti-virus programs. Finding new software with similarities to known malware is therefore an important goal in cyber-security. A dynamic instruction trace of a piece of software is the sequence of machine language instructions it generates when executed. Statistical analysis of a dynamic instruction trace can help reverse engineers infer the purpose and origin of the software that generated it. Instruction traces have been successfully modeled as simple Markov chains, but empirically there are change points in the structure of the traces, with recurring regimes of transition patterns. Here, reversible jump MCMC for change point detection is extended to incorporate regime-switching, allowing regimes to be inferred from malware instruction traces. A similarity measure for malware programs based on regime matching is then used to infer the originating families, leading to compelling performance results.

Heard N, Rubin-Delanchy P, 2018, Choosing between methods of combining p-values, *Biometrika*, Vol: 105, Pages: 239-246, ISSN: 0006-3444

Combining p-values from independent statistical tests is a popular approachto meta-analysis, particularly when the data underlying the tests are either nolonger available or are difficult to combine. A diverse range of p-valuecombination methods appear in the literature, each with different statisticalproperties. Yet all too often the final choice used in a meta-analysis canappear arbitrary, as if all effort has been expended building the models thatgave rise to the p-values. Birnbaum (1954) showed that any reasonable p-valuecombiner must be optimal against some alternative hypothesis. Starting fromthis perspective and recasting each method of combining p-values as alikelihood ratio test, we present theoretical results for some of the standardcombiners which provide guidance about how a powerful combiner might be chosenin practice.

Price-Williams M, Heard N, Turcotte M, 2017, Detecting periodic subsequences in cyber security data, Pages: 84-90

© 2017 IEEE. Anomaly detection for cyber-security defence hasgarnered much attention in recent years providing an orthogonalapproach to traditional signature-based detection systems.Anomaly detection relies on building probability models ofnormal computer network behaviour and detecting deviationsfrom the model. Most data sets used for cyber-security havea mix of user-driven events and automated network events,which most often appears as polling behaviour. Separating theseautomated events from those caused by human activity is essentialto building good statistical models for anomaly detection. This articlepresents a changepoint detection framework for identifyingautomated network events appearing as periodic subsequences ofevent times. The opening event of each subsequence is interpretedas a human action which then generates an automated, periodicprocess. Difficulties arising from the presence of duplicate andmissing data are addressed. The methodology is demonstrated usingauthentication data from Los Alamos National Laboratory'senterprise computer network.

Rubin-Delanchy P, HEARD NA, 2016, Disassortativity of computer networks, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

Network data is ubiquitous in cyber-security applications. Accurately modelling such data allows discovery of anomalous edges, subgraphs or paths, and is key to many signature-free cyber-security analytics. We present a recurring property of graphs originating from cyber-security applications, often considered a ‘corner case’ in the main literature on network data analysis, that greatly affects the performance of standard ‘off-the-shelf’ techniques. This is the property that similarity, in terms of network behaviour, does not imply connectivity, and in fact the reverse is often true. We call this disassortivity. The phenomenon is illustrated using network flow data collected on an enterprise network, and we show how Big Data analytics designed to detect unusual connectivity patterns can be improved.

GriffiÃ© J, Shannon M, Bromley CL,
et al., 2016, A Bayesian cluster analysis method for single-molecule localization microscopy data, *Nature Protocols*, Vol: 11, Pages: 2499-2514, ISSN: 1754-2189

Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.

Heard NA, Palla K, Skoularidou M, 2016, Topic modelling of authentication events in an enterprise computer network, IEEE Conference on Intelligence and Security Informatics (ISI), 2016, Publisher: IEEE

The possibility for theft or misuse of legitimate user credentials is a potential cyber-security weakness in any enterprise computer network which is almost impossible to eradicate. However, by monitoring the network traffic patterns, it can be possible to detect misuse of credentials. This article presents an initial investigation into deconvolving the mixture behaviour of several individuals within a network, to see if individual users can be identified. Towards that, a technique used for document classification is deployed, the Latent Dirichlet allocation model. A pilot study is conducted on authentication events taken from real data from the enterprise network of Los Alamos National Laboratory.

Heard NA, Rubin-Delanchy P, 2016, Network-wide anomaly detection via the Dirichlet process, IEEE Conference on Intelligence and Security Informatics (ISI), 2016, Publisher: IEEE

Statistical anomaly detection techniques provide the next layer of cyber-security defences below traditional signature-based approaches. This article presents a scalable, principled, probability-based technique for detecting outlying connectivity behaviour within a directed interaction network such as a computer network. Independent Bayesian statistical models are fit to each message recipient in the network using the Dirichlet process, which provides a tractable, conjugate prior distribution for an unknown discrete probability distribution. The method is shown to successfully detect a red team attack in authentication data obtained from the enterprise network of Los Alamos National Laboratory.

Metelli S, Heard NA, 2016, Model-based clustering and new edge modelling in large computer networks, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

Computer networks are complex and the analysis of their structure in search for anomalous behaviour is both a challenging and important task for cyber security. For instance, new edges, i.e. connections from a host or user to a computer that has not been connected to before, provide potentially strong statistical evidence for detecting anomalies. Unusual new edges can sometimes be indicative of both legitimate activity, such as automated update requests permitted by the client, and illegitimate activity, such as denial of service (DoS) attacks to cause service disruption or intruders escalating privileges by traversing through the host network. In both cases, capturing and accumulating evidence of anomalous new edge formation represents an important security application. Computer networks tend to exhibit an underlying cluster structure, where nodes are naturally grouped together based on similar connection patterns. What constitutes anomalous behaviour may strongly differ between clusters, so inferring these peer groups constitutes an important step in modelling the types of new connections a user would make. In this article, we present a two-step Bayesian statistical method aimed at clustering similar users inside the network and simultaneously modelling new edge activity, exploiting both overall-level and cluster-level covariates.

Turcotte M, Moore J, Heard NA, et al., 2016, Poisson factorization for peer-based anomaly detection, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

Anomaly detection systems are a promising tool to identify compromised user credentials and malicious insiders in enterprise networks. Most existing approaches for modelling user behaviour rely on either independent observations for each user or on pre-defined user peer groups. A method is proposed based on recommender system algorithms to learn overlapping user peer groups and to use this learned structure to detect anomalous activity. Results analysing the authentication and process-running activities of thousands of users show that the proposed method can detect compromised user accounts during a red team exercise.

Heard NA, Turcotte MJM, 2016, Adaptive sequential Monte Carlo for multiple changepoint analysis, *Journal of Computational and Graphical Statistics*, Vol: 26, Pages: 414-423, ISSN: 1537-2715

Process monitoring and control requires detection of structural changes in a data stream in real time. This article introduces an efficient sequential Monte Carlo algorithm designed for learning unknown changepoints in continuous time. The method is intuitively simple: new changepoints for the latest window of data are proposed by conditioning only on data observed since the most recent estimated changepoint, as these observations carry most of the information about the current state of the process. The proposed method shows improved performance over the current state of the art. Another advantage of the proposed algorithm is that it can be made adaptive, varying the number of particles according to the apparent local complexity of the target changepoint probability distribution. This saves valuable computing time when changes in the changepoint distribution are negligible, and enables re-balancing of the importance weights of existing particles when a significant change in the target distribution is encountered. The plain and adaptive versions of the method are illustrated using the canonical continuous time changepoint problem of inferring the intensity of an inhomogeneous Poisson process, although the method is generally applicable to any changepoint problem. Performance is demonstrated using both conjugate and non-conjugate Bayesian models for the intensity. Appendices to the article are available online, illustrating the method on other models and applications.

Turcotte MJM, Heard NA, Kent AD, 2016, Modelling user behaviour in a network using computer event logs, Dynamic Networks and Cyber-Security, Pages: 67-87, ISBN: 9781786340740

© 2016 by World Scientific Publishing Europe Ltd. All rights reserved. Computer event logs are a potentially valuable resource in detecting cyber security threats on a computer network. One important research problem associated with these logs is user credential theft or misuse, either by a malicious insider or an external adversary. Once compromised, a user credential can be used by an adversary to advance through the network and further their goals. Little attention is currently given to looking at computer event logs as an aggregated multivariate data stream. The aim of the work in this chapter is to model user credential patterns on the network by considering independently the time series of events generated by each user credential. Simple Bayesian models are fit to the event data for each user credential, providing a flexible global framework for monitoring credentials on an enterprise network and identifying potentially compromised credentials.

Adams N, Heard N, 2016, Dynamic networks and cyber-security, ISBN: 9781786340740

© 2016 by World Scientific Publishing Europe Ltd. All rights reserved. As an under-studied area of academic research, the analysis of computer network traffic data is still in its infancy. However, the challenge of detecting and mitigating malicious or unauthorised behaviour through the lens of such data is becoming an increasingly prominent issue. This collection of papers by leading researchers and practitioners synthesises cutting-edge work in the analysis of dynamic networks and statistical aspects of cyber security. The book is structured in such a way as to keep security application at the forefront of discussions. It offers readers easy access into the area of data analysis for complex cyber-security applications, with a particular focus on temporal and network aspects. Chapters can be read as standalone sections and provide rich reviews of the latest research within the field of cyber-security. Academic readers will benefit from state-of-the-art descriptions of new methodologies and their extension to real practical problems while industry professionals will appreciate access to more advanced methodology than ever before.

Rubin-Delanchy P, Lawson DJ, Heard NA, 2016, Anomaly detection for cyber security applications, Dynamic Networks and Cyber-Security, Pages: 137-156, ISBN: 9781786340740

© 2016 by World Scientific Publishing Europe Ltd. All rights reserved. In this chapter, we outline a general modus operandi under which to perform intrusion detection at scale. The over-arching principle is this: A network monitoring tool has access to large stores of data on which it can learn 'normal' network behaviour. On the other hand, data on intrusions are relatively rare. This imbalance invites us to frame intrusion detection as an anomaly detection problem where, under the null hypothesis that there is no intrusion, the data follow a machine-learnt model of behaviour, and, under the alternative that there is some form of intrusion, certain anomalies in that model will be apparent. This approach to cyber security poses some important statistical challenges. One is simply modelling and doing inference with such large-scale and heterogeneous data. Another is performing anomaly detection when the null hypothesis comprises a complex model. Finally, a key problem is combining different anomalies through time and across the network.

Rubin-Delanchy P, Burn GL, GriffiÃ© J,
et al., 2015, Bayesian cluster identification in single-molecule localization microscopy data., *Nature Methods*, Vol: 12, Pages: 1072-1076, ISSN: 1548-7105

Single-molecule localization-based super-resolution microscopy techniques such as photoactivated localization microscopy (PALM) and stochastic optical reconstruction microscopy (STORM) produce pointillist data sets of molecular coordinates. Although many algorithms exist for the identification and localization of molecules from raw image data, methods for analyzing the resulting point patterns for properties such as clustering have remained relatively under-studied. Here we present a model-based Bayesian approach to evaluate molecular cluster assignment proposals, generated in this study by analysis based on Ripley's K function. The method takes full account of the individual localization precisions calculated for each emitter. We validate the approach using simulated data, as well as experimental data on the clustering behavior of CD3ζ, a subunit of the CD3 T cell receptor complex, in resting and activated primary human T cells.

Heard NA, Turcotte MJM, 2015, Convergence of Monte Carlo distribution estimates from rival samplers, *Statistics and Computing*, Vol: 26, Pages: 1147-1161, ISSN: 1573-1375

It is often necessary to make sampling-based statistical inference about many probability distributions in parallel. Given a finite computational resource, this article addresses how to optimally divide sampling effort between the samplers of the different distributions. Formally approaching this decision problem requires both the specification of an error criterion to assess how well each group of samples represent their underlying distribution, and a loss function to combine the errors into an overall performance score. For the first part, a new Monte Carlo divergence error criterion based on Jensen–Shannon divergence is proposed. Using results from information theory, approximations are derived for estimating this criterion for each target based on a single run, enabling adaptive sample size choices to be made during sampling.

Heard N, Rubin-Delanchy P, Lawson D, 2014, Filtering automated polling traffic in computer network flow data, IEEE Joint Intelligence and Security Informatics Conference (JISIC 2014), Publisher: IEEE, Pages: 268-271

Detecting polling behaviour in a computer network has two important applications. First, the polling can be indicative of malware beaconing, where an undetected software virus sends regular communications to a controller. Second, the cause of the polling may not be malicious, since it may correspond to regular automated update requests permitted by the client, to build models of normal host behaviour for signature-free anomaly detection, this polling behaviour needs to be understood. This article presents a simple Fourier analysis technique for identifying regular polling, and focuses on the second application: modelling the normal behaviour of a host, using real data collected from the computer network of Imperial College London.

Metelli S, Heard N, 2014, Modelling new edge formation in a computer network through Bayesian variable selection, IEEE Joint Intelligence and Security Informatics Conference 2014, Publisher: IEEE

Anomalous connections in a computer network graph can be a signal of malicious behaviours. For instance, a compromised computer node tends to form a large number of new client edges in the network graph, connecting to server IP (Internet Protocol) addresses which have not previously been visited. This behaviour can be caused by malware (malicious software) performing a denial of service (DoS) attack, to cause disruption or further spread malware, alternatively, the rapid formation of new edges by a compromised node can be caused by an intruder seeking to escalate privileges by traversing through the host network. However, study of computer network flow data suggests new edges are also regularly formed by uninfected hosts, and often in bursts. Statistically detecting anomalous formation of new edges requires reliable models of the normal rate of new edges formed by each host. Network traffic data are complex, and so the potential number of variables which might be included in such a statistical model can be large, and without proper treatment this would lead to overfitting of models with poor predictive performance. In this paper, Bayesian variable selection is applied to a logistic regression model for new edge formation for the purpose of selecting the best subset of variables to include.

Lawson DJ, Rubin-Delanchy P, Heard N, et al., 2014, Statistical frameworks for detecting tunnelling in cyber defence using big data, IEEE Joint Intelligence and Security Informatics Conference (JISIC 2014), Publisher: IEEE, Pages: 248-251

- Author Web Link
- Cite
- Citations: 1

Turcotte M, Heard N, Neil J, 2014, Detecting Localised Anomalous Behaviour in a Computer Network, 13th International Symposium on Intelligent Data Analysis (IDA), Publisher: SPRINGER INT PUBLISHING AG, Pages: 321-332, ISSN: 0302-9743

- Author Web Link
- Cite
- Citations: 6

Rubin-Delanchy P, Lawson DJ, Turcotte MJ, et al., 2014, Three statistical approaches to sessionizing network flow data, IEEE Joint Intelligence and Security Informatics Conference (JISIC 2014), Publisher: IEEE, Pages: 244-247

- Author Web Link
- Cite
- Citations: 1

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.