Publications

Adams N, Heard N, 2016, Dynamic networks and cyber-security, ISBN: 9781786340740

As an under-studied area of academic research, the analysis of computer network traffic data is still in its infancy. However, the challenge of detecting and mitigating malicious or unauthorised behaviour through the lens of such data is becoming an increasingly prominent issue. This collection of papers by leading researchers and practitioners synthesises cutting-edge work in the analysis of dynamic networks and statistical aspects of cyber security. The book is structured in such a way as to keep security application at the forefront of discussions. It offers readers easy access into the area of data analysis for complex cyber-security applications, with a particular focus on temporal and network aspects. Chapters can be read as standalone sections and provide rich reviews of the latest research within the field of cyber-security. Academic readers will benefit from state-of-the-art descriptions of new methodologies and their extension to real practical problems while industry professionals will appreciate access to more advanced methodology than ever before.

Abstract
Cite
Citations: 15

Book

Rubin-Delanchy P, Lawson DJ, Heard NA, 2016, Anomaly detection for cyber security applications, Dynamic Networks and Cyber-Security, Pages: 137-156, ISBN: 9781786340740

In this chapter, we outline a general modus operandi under which to perform intrusion detection at scale. The over-arching principle is this: A network monitoring tool has access to large stores of data on which it can learn 'normal' network behaviour. On the other hand, data on intrusions are relatively rare. This imbalance invites us to frame intrusion detection as an anomaly detection problem where, under the null hypothesis that there is no intrusion, the data follow a machine-learnt model of behaviour, and, under the alternative that there is some form of intrusion, certain anomalies in that model will be apparent. This approach to cyber security poses some important statistical challenges. One is simply modelling and doing inference with such large-scale and heterogeneous data. Another is performing anomaly detection when the null hypothesis comprises a complex model. Finally, a key problem is combining different anomalies through time and across the network.

Abstract
Cite
Citations: 6

Book chapter

Rubin-Delanchy P, Burn GL, Griffié J, Williamson DJ, Heard NA, Cope AP, Owen DMet al., 2015, Bayesian cluster identification in single-molecule localization microscopy data., Nature Methods, Vol: 12, Pages: 1072-1076, ISSN: 1548-7105

Single-molecule localization-based super-resolution microscopy techniques such as photoactivated localization microscopy (PALM) and stochastic optical reconstruction microscopy (STORM) produce pointillist data sets of molecular coordinates. Although many algorithms exist for the identification and localization of molecules from raw image data, methods for analyzing the resulting point patterns for properties such as clustering have remained relatively under-studied. Here we present a model-based Bayesian approach to evaluate molecular cluster assignment proposals, generated in this study by analysis based on Ripley's K function. The method takes full account of the individual localization precisions calculated for each emitter. We validate the approach using simulated data, as well as experimental data on the clustering behavior of CD3ζ, a subunit of the CD3 T cell receptor complex, in resting and activated primary human T cells.

Journal article

Heard NA, Turcotte MJM, 2015, Convergence of Monte Carlo distribution estimates from rival samplers, Statistics and Computing, Vol: 26, Pages: 1147-1161, ISSN: 1573-1375

It is often necessary to make sampling-based statistical inference about many probability distributions in parallel. Given a finite computational resource, this article addresses how to optimally divide sampling effort between the samplers of the different distributions. Formally approaching this decision problem requires both the specification of an error criterion to assess how well each group of samples represent their underlying distribution, and a loss function to combine the errors into an overall performance score. For the first part, a new Monte Carlo divergence error criterion based on Jensen–Shannon divergence is proposed. Using results from information theory, approximations are derived for estimating this criterion for each target based on a single run, enabling adaptive sample size choices to be made during sampling.

Journal article

Heard N, Rubin-Delanchy P, Lawson D, 2014, Filtering automated polling traffic in computer network flow data, IEEE Joint Intelligence and Security Informatics Conference (JISIC 2014), Publisher: IEEE, Pages: 268-271

Detecting polling behaviour in a computer network has two important applications. First, the polling can be indicative of malware beaconing, where an undetected software virus sends regular communications to a controller. Second, the cause of the polling may not be malicious, since it may correspond to regular automated update requests permitted by the client, to build models of normal host behaviour for signature-free anomaly detection, this polling behaviour needs to be understood. This article presents a simple Fourier analysis technique for identifying regular polling, and focuses on the second application: modelling the normal behaviour of a host, using real data collected from the computer network of Imperial College London.

Conference paper

Metelli S, Heard N, 2014, Modelling new edge formation in a computer network through Bayesian variable selection, IEEE Joint Intelligence and Security Informatics Conference 2014, Publisher: IEEE

Anomalous connections in a computer network graph can be a signal of malicious behaviours. For instance, a compromised computer node tends to form a large number of new client edges in the network graph, connecting to server IP (Internet Protocol) addresses which have not previously been visited. This behaviour can be caused by malware (malicious software) performing a denial of service (DoS) attack, to cause disruption or further spread malware, alternatively, the rapid formation of new edges by a compromised node can be caused by an intruder seeking to escalate privileges by traversing through the host network. However, study of computer network flow data suggests new edges are also regularly formed by uninfected hosts, and often in bursts. Statistically detecting anomalous formation of new edges requires reliable models of the normal rate of new edges formed by each host. Network traffic data are complex, and so the potential number of variables which might be included in such a statistical model can be large, and without proper treatment this would lead to overfitting of models with poor predictive performance. In this paper, Bayesian variable selection is applied to a logistic regression model for new edge formation for the purpose of selecting the best subset of variables to include.

Conference paper

Bolton A, Heard N, 2014, Application of a linear time method for change point detection to the classification of software, Pages: 292-295

A computer program's dynamic instruction trace is the sequence of instructions it generates during run-time. This article presents a method for analysing dynamic instruction traces, with an application in malware detection. Instruction traces can be modelled as piecewise homogeneous Markov chains and an exact linear time method is used for detecting change points in the transition probability matrix. The change points divide the instruction trace into segments performing different functions. If segments performing malicious functions can be detected then the software can be classified as malicious. The change point detection method is applied to both a simulated dynamic instruction trace and the dynamic instruction trace generated by a piece of malware.

Abstract
Cite
Citations: 1

Conference paper

Adams N, Heard N, 2014, Data analysis for network cyber-security, ISBN: 9781783263745

There is increasing pressure to protect computer networks against unauthorized intrusion, and some work in this area is concerned with engineering systems that are robust to attack. However, no system can be made invulnerable. Data Analysis for Network Cyber-Security focuses on monitoring and analyzing network traffic data, with the intention of preventing, or quickly identifying, malicious activity. Such work involves the intersection of statistics, data mining and computer science. Fundamentally, network traffic is relational, embodying a link between devices. As such, graph analysis approaches are a natural candidate. However, such methods do not scale well to the demands of real problems, and the critical aspect of the timing of communications events is not accounted for in these approaches. This book gathers papers from leading researchers to provide both background to the problems and a description of cutting-edge methodology. The contributors are from diverse institutions and areas of expertise and were brought together at a workshop held at the University of Bristol in March 2013 to address the issues of network cyber security. The workshop was supported by the Heilbronn Institute for Mathematical Research.

Abstract
Cite
Citations: 17

Book

Heard NA, Turcotte MJ, 2014, Monitoring a device in a communication network, Data Analysis for Network Cyber-Security, Pages: 151-188, ISBN: 9781783263745

Anomalous connectivity levels in a communication graph can be indicative of prohibited or malicious behaviour. Detecting anomalies in large graphs, such as telecommunication networks or corporate computer networks, requires techniques which are computationally fast and ideally parallelisable, and this puts a limit on the level of sophistication which can be used in modelling the entire graph. Here, methods are presented for detecting locally anomalous substructures based on simple node and edge-based statistical models. This can be viewed as an initial screening stage for identifying candidate anomalies, which could then be investigated with more sophisticated tools. The focus is on monitoring diverse features of the same data stream emanating from a single communicating device within the network, using conditionally independent probability models. Whilst all of the models considered are purposefully very simple, their practical implementation touches on a diverse range of topics, including conjugate Bayesian inference, reversible jump Markov chain Monte Carlo, sequential Monte Carlo, Markov jump processes, Markov chains, density estimation, changepoint analysis, discrete p-values and control charts.

Abstract
Cite
Citations: 2

Book chapter

Lawson DJ, Rubin-Delanchy P, Heard N, Adams Net al., 2014, Statistical frameworks for detecting tunnelling in cyber defence using big data, IEEE Joint Intelligence and Security Informatics Conference (JISIC 2014), Publisher: IEEE, Pages: 248-251

Author Web Link
Cite
Citations: 1

Conference paper

Rubin-Delanchy P, Lawson DJ, Turcotte MJ, Heard N, Adams Net al., 2014, Three statistical approaches to sessionizing network flow data, IEEE Joint Intelligence and Security Informatics Conference (JISIC 2014), Publisher: IEEE, Pages: 244-247

Author Web Link
Cite
Citations: 3

Conference paper

Turcotte M, Heard N, Neil J, 2014, Detecting Localised Anomalous Behaviour in a Computer Network, 13th International Symposium on Intelligent Data Analysis (IDA), Publisher: SPRINGER INT PUBLISHING AG, Pages: 321-332, ISSN: 0302-9743

Author Web Link
Cite
Citations: 6

Conference paper

Fowler A, Menon V, Heard NA, 2013, DYNAMIC BAYESIAN CLUSTERING, JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, Vol: 11, ISSN: 0219-7200

Author Web Link
Cite
Citations: 4

Journal article

Fowler A, Heard NA, 2013, Dynamic Bayesian clustering of gene expression data, Pages: 165-170

Clusters of time series data may change location and memberships over time; in gene expression data, this occurs as groups of genes or samples respond differently to stimuli or experimental conditions at different times. In order to uncover this underlying temporal structure, we consider dynamic clusters which not only change location but also split and merge over time, enabling cluster memberships to change. Dynamic clustering is applied to both cyclic and developmental gene expression data sets and reveals interesting, time-dependent structures which could not be identified using traditional clustering methods.

Abstract
Cite

Conference paper

Fowler A, Heard NA, 2012, On two-way Bayesian agglomerative clustering of gene expression data, Statistical Analy Data Mining, Vol: 5, Pages: 463-476, ISSN: 1932-1864

Journal article

Heard NA, 2011, Iterative Reclassification in Agglomerative Clustering, JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, Vol: 20, Pages: 920-936, ISSN: 1061-8600

Journal article

Heard NA, Weston DJ, Platanioti K, Hand DJet al., 2010, BAYESIAN ANOMALY DETECTION METHODS FOR SOCIAL NETWORKS, ANNALS OF APPLIED STATISTICS, Vol: 4, Pages: 645-662, ISSN: 1932-6157

Journal article

Bushel PR, Heard NA, Gutman R, Liu L, Peddada SD, Pyne Set al., 2009, Dissecting the fission yeast regulatory network reveals phase-specific control elements of its cell cycle, BMC SYSTEMS BIOLOGY, Vol: 3

Journal article

Heard NA, Holmes CC, Stephens DA, 2006, A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves, JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, Vol: 101, Pages: 18-29, ISSN: 0162-1459

Author Web Link
Cite
Citations: 127

Journal article

Heard NA, Holmes CC, Stephens DA, Hand DJ, Dimopoulos Get al., 2005, Bayesian coclustering of <i>Anopheles</i> gene expression time series:: Study of immune defense response to multiple experimental challenges, PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, Vol: 102, Pages: 16939-16944, ISSN: 0027-8424

Author Web Link
Cite
Citations: 44

Journal article

Hand DJ, Heard NA, 2005, Finding groups in gene expression data, JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, Pages: 215-225, ISSN: 1110-7243

Journal article

Hand DJ, Adams NM, Heard NA, 2005, Pattern discovery tools for detecting cheating in student coursework, Berlin, Local pattern detection. International seminar. Dagstuhl Castle, Germany, 12 - 16 April 2004, Publisher: Springer-Verlag, Pages: 39-52

Cite

Conference paper

Heard NA, 2004, Technology in genetics - Automating the scientific process, HEREDITY, Vol: 93, Pages: 6-7, ISSN: 0018-067X

Journal article

Holmes CC, Heard NA, 2003, Generalized monotonic regression using random change points, STATISTICS IN MEDICINE, Vol: 22, Pages: 623-638, ISSN: 0277-6715

Author Web Link
Cite
Citations: 44

Journal article

Professor Nick Heard

Contact

Location

Summary