Publications

Sanna Passino F, Adams N, Cohen E, Evangelou M, Heard NAet al., 2023, Statistical cybersecurity: a brief discussion of challenges, data structures, and future directions, Harvard Data Science Review, Vol: 5, Pages: 1-10, ISSN: 2644-2353

Journal article

Hallgren KL, Heard NA, Turcotte MJM, 2023, Changepoint detection on a graph of time series, Bayesian Analysis, Vol: -, Pages: 1-28, ISSN: 1936-0975

When analysing multiple time series that may be subject to changepoints, it is sometimes possible to specify a priori, by means of a graph, which pairs of time series are likely to be impacted by simultaneous changepoints. This article proposes an informative prior for changepoints which encodes the information contained in the graph, inducing a changepoint model for multiple time series that borrows strength across clusters of connected time series to detect weak signals for synchronous changepoints. The graphical model for changepoints is further extended to allow dependence between nearby but not necessarily synchronous changepoints across neighbouring time series in the graph. A novel reversible jump Markov chain Monte Carlo (MCMC) algorithm making use of auxiliary variables is proposed to sample from the graphical changepoint model. The merit of the proposed approach is demonstrated through a changepoint analysis of computer network authentication logs from Los Alamos National Laboratory (LANL), demonstrating an improvement at detecting weak signals for network intrusions across users linked by network connectivity, whilst limiting the number of false alerts.

Journal article

Sanna Passino F, Heard NA, 2023, Mutually exciting point process graphs for modelling dynamic networks, Journal of Computational and Graphical Statistics, Vol: 32, Pages: 116-130, ISSN: 1061-8600

A new class of models for dynamic networks is proposed, called mutually exciting point process graphs (MEG). MEG is a scalable network-wide statistical model for point processes with dyadic marks, which can be used for anomaly detection when assessing the significance of future events, including previously unobserved connections between nodes. The model combines mutually exciting point processes to estimate dependencies between events and latent space models to infer relationships between the nodes. The intensity functions for each network edge are characterized exclusively by node-specific parameters, which allows information to be shared across the network. This construction enables estimation of intensities even for unobserved edges, which is particularly important in real world applications, such as computer networks arising in cyber-security. A recursive form of the log-likelihood function for MEG is obtained, which is used to derive fast inferential procedures via modern gradient ascent algorithms. An alternative EM algorithm is also derived. The model and algorithms are tested on simulated graphs and real world datasets, demonstrating excellent performance. Supplementary materials for this article are available online.

Journal article

Hallgren KL, Heard NA, Adams NM, 2022, Changepoint detection in non-exchangeable data, Statistics and Computing, Vol: 32, Pages: 1-19, ISSN: 0960-3174

Changepoint models typically assume the data within each segment are independent and identically distributed conditional on some parameters that change across segments. This construction may be inadequate when data are subject to local correlation patterns, often resulting in many more changepoints fitted than preferable. This article proposes a Bayesian changepoint model that relaxes the assumption of exchangeability within segments. The proposed model supposes data within a segment are m-dependent for some unknown m⩾0 that may vary between segments, resulting in a model suitable for detecting clear discontinuities in data that are subject to different local temporal correlations. The approach is suited to both continuous and discrete data. A novel reversible jump Markov chain Monte Carlo algorithm is proposed to sample from the model; in particular, a detailed analysis of the parameter space is exploited to build proposals for the orders of dependence. Two applications demonstrate the benefits of the proposed model: computer network monitoring via change detection in count data, and segmentation of financial time series.

Journal article

Sanna Passino F, Heard N, 2022, Latent structure blockmodels for Bayesian spectral graph clustering, Statistics and Computing, Vol: 32, ISSN: 0960-3174

Spectral embedding of network adjacency matrices often produces node representations living approximately around low-dimensional submanifold structures. In particular, hidden substructure is expected to arise when the graph is generated from a latent position model. Furthermore, the presence of communities within the network might generate community-specific submanifold structures in the embedding, but this is not explicitly accounted for in most statistical models for networks. In this article, a class of models called latent structure block models (LSBM) is proposed to address such scenarios, allowing for graph clustering when community-specific one dimensional manifold structure is present. LSBMs focus on a specific class of latent space model, the random dot product graph (RDPG), and assign a latent submanifold to the latent positions of each community. A Bayesian model for the embeddings arising from LSBMs is discussed, and shown to have a good performance on simulated and real world network data. The model is able to correctly recover the underlying communities living in a one-dimensional manifold, even when the parametric form of the underlying curves is unknown, achieving remarkable results on a variety of real data.

Journal article

Sanna Passino F, Heard NA, Rubin-Delanchy P, 2022, Spectral clustering on spherical coordinates under the degree-corrected stochastic blockmodel, Technometrics, Vol: 64, Pages: 346-357, ISSN: 0040-1706

Spectral clustering is a popular method for community detection in networksunder the assumption of the standard stochastic blockmodel. Taking a matrix representation of the graph such as the adjacency matrix, the nodes are clustered on a low dimensional projection obtained from a truncated spectral decomposition of the matrix. Estimating the number of communities and the dimension of the reduced latent space well is crucial for good performance of spectral clustering algorithms. Real-world networks, such as computer networks studied in cyber-security applications, often present heterogeneous within-community degree distributions which are better addressed by the degree-corrected stochastic blockmodel. A novel, model-based method is proposed in this article for simultaneous and automated selection of the number of communities and latent dimension for spectral clustering under the degree-corrected stochastic blockmodel. The method is based on a transformation to spherical coordinates of the spectral embedding, and on a novel modelling assumption in the transformed space, which is then embedded into an existing model selection framework for estimating the number of communities and the latent dimension. Results show improved performance over competing methods on simulated and real-world computer network data.

Journal article

Heard NA, 2021, Standardized partial sums and products of p-values, Journal of Computational and Graphical Statistics, Vol: 31, Pages: 563-573, ISSN: 1061-8600

In meta analysis, a diverse range of methods for combining multiple p-values have been applied throughout the scientific literature. For sparse signals where only a small proportion of the p-values are truly significant, a technique called higher criticism has previously been shown to have asymptotic consistency and more power than Fisher’s original method. However, higher criticism and other related methods can still lack power. Three new, simple to compute statistics are now proposed for detecting sparse signals, based on standardizing partial sums or products of p-value order statistics. The use of standardization is theoretically justified with results demonstrating asymptotic normality, and avoids the computational difficulties encountered when working with analytic forms of the distributions of the partial sums and products. In particular, the standardized partial product demonstrates more power than existing methods for both the standard Gaussian mixture model and a real data example from computer network modeling.

Journal article

Heard N, 2021, An introduction to Bayesian inference, methods and computation, ISBN: 9783030828073

These lecture notes provide a rapid, accessible introduction to Bayesian statistical methods. The course covers the fundamental philosophy and principles of Bayesian inference, including the reasoning behind the prior/likelihood model construction synonymous with Bayesian methods, through to advanced topics such as nonparametrics, Gaussian processes and latent factor models. These advanced modelling techniques can easily be applied using computer code samples written in Python and Stan which are integrated into the main text. Importantly, the reader will learn methods for assessing model fit, and to choose between rival modelling approaches.

Abstract
Cite
Citations: 5

Book

Sanna Passino F, Turcotte MJM, Heard NA, 2021, Graph link prediction in computer networks using Poisson matrix factorisation, Annals of Applied Statistics, Vol: 16, Pages: 1313-1332, ISSN: 1932-6157

Graph link prediction is an important task in cyber-security: relationshipsbetween entities within a computer network, such as users interacting withcomputers, or system libraries and the corresponding processes that use them,can provide key insights into adversary behaviour. Poisson matrix factorisation(PMF) is a popular model for link prediction in large networks, particularlyuseful for its scalability. In this article, PMF is extended to includescenarios that are commonly encountered in cyber-security applications.Specifically, an extension is proposed to explicitly handle binary adjacencymatrices and include known covariates associated with the graph nodes. Aseasonal PMF model is also presented to handle dynamic networks. To allow themethods to scale to large graphs, variational methods are discussed forperforming fast inference. The results show an improved performance over thestandard PMF model and other common link prediction techniques.

Journal article

Sanna Passino F, Bertiger AS, Neil JC, Heard NAet al., 2021, Link prediction in dynamic networks using random dot product graphs, Data Mining and Knowledge Discovery, Vol: 35, Pages: 2168-2199, ISSN: 1384-5810

The problem of predicting links in large networks is a crucial task in avariety of practical applications, including social sciences, biology andcomputer security. In this paper, statistical techniques for link predictionbased on the popular random dot product graph model are carefully presented,analysed and extended to dynamic settings. Motivated by a practical applicationin cyber-security, this paper demonstrates that random dot product graphs notonly represent a powerful tool for inferring differences between multiplenetworks, but are also efficient for prediction purposes and for understandingthe temporal evolution of the network. The probabilities of links are obtainedby fusing information at multiple levels of resolution: time series models areused to score connections at the edge level, and spectral methods provideestimates of latent positions for each node. In this way, traditional linkprediction methods, usually based on decompositions of the entire networkadjacency matrix, are extended using edge-specific information. The methodspresented in this article are applied to a number of simulated and real-worldcomputer network graphs, showing promising results.

Journal article

Sanna Passino F, Heard N, 2020, Bayesian estimation of the latent dimension and communities in stochastic blockmodels, Statistics and Computing, Vol: 30, Pages: 1291-1307, ISSN: 0960-3174

Spectral embedding of adjacency or Laplacian matrices of undirected graphs is a common technique for representing a network in a lower dimensional latent space, with optimal theoretical guarantees. The embedding can be used to estimate the community structure of the network, with strong consistency results in the stochastic blockmodel framework. One of the main practical limitations of standard algorithms for community detection from spectral embeddings is that the number of communities and the latent dimension of the embedding must be specified in advance. In this article, a novel Bayesian model for simultaneous and automatic selection of the appropriate dimension of the latent space and the number of blocks is proposed. Extensions to directed and bipartite graphs are discussed. The model is tested on simulated and real world network data, showing promising performance for recovering latent community structure.

Journal article

Sanna Passino F, Heard NA, 2020, Classification of periodic arrivals in event time data for filtering computer network traffic, Statistics and Computing, Vol: 30, Pages: 1241-1254, ISSN: 0960-3174

Periodic patterns can often be observed in real-world event time data, possibly mixed with non-periodic arrival times. For modelling purposes, it is necessary to correctly distinguish the two types of events. This task has particularly important implications in computer network security; there, separating automated polling traffic and human-generated activity in a computer network is important for building realistic statistical models for normal activity, which in turn can be used for anomaly detection. Since automated events commonly occur at a fixed periodicity, statistical tests using Fourier analysis can efficiently detect whether the arrival times present an automated component. In this article, sequences of arrival times which contain automated events are further examined, to separate polling and non-periodic activity. This is first achieved using a simple mixture model on the unit circle based on the angular positions of each event time on the p-clock, where p represents the main periodicity associated with the automated activity; this model is then extended by combining a second source of information, the time of day of each event. Efficient implementations exploiting conjugate Bayesian models are discussed, and performance is assessed on real network flow data collected at Imperial College London.

Journal article

Price-Williams M, Heard N, 2020, Nonparametric self-exciting models for computer network traffic, Statistics and Computing, Vol: 30, Pages: 209-220, ISSN: 0960-3174

Connectivity patterns between nodes in a computer network can be interpreted and modelled as point processes where events in a process indicate connections being established for data to be sent along that edge. A model of normal connectivity behaviour can be constructed for each edge in a network by identifying key network user features such as seasonality or self-exciting behaviour, since events typically arise in bursts at particular times of day which may be peculiar to that edge. When monitoring a computer network in real time, unusual patterns of activity against the model of normality could indicate the presence of a malicious actor. A flexible, novel, nonparametric model for the excitation function of a Wold process is proposed for modelling the conditional intensities of network edges. This approach is shown to outperform standard seasonality and self-excitation models in predicting network connections, achieving well-calibrated predictions for event data collected from the computer networks of both Imperial College and Los Alamos National Laboratory.

Journal article

Metelli S, Heard N, 2019, On Bayesian new edge prediction and anomaly detection in computer networks, Annals of Applied Statistics, Vol: 13, Pages: 2586-2610, ISSN: 1932-6157

Monitoring computer network traffic for anomalous behaviourpresents an important security challenge. Arrivals of new edges in anetwork graph represent connections between a client and server pairnot previously observed, and in rare cases these might suggest thepresence of intruders or malicious implants. We propose a Bayesianmodel and anomaly detection method for simultaneously characterising existing network structure and modelling likely new edge formation. The method is demonstrated on real computer network authentication data and successfully identifies some machines which areknown to be compromised.

Journal article

Sanna Passino F, Heard NA, 2019, Modelling dynamic network evolution as a Pitman-Yor process, Foundations of Data Science, Vol: 1, Pages: 293-306, ISSN: 2639-8001

Dynamic interaction networks frequently arise in biology, communications technology and the social sciences, representing, for example, neuronal connectivity in the brain, internet connections between computers and human interactions within social networks. The evolution and strengthening of the links in such networks can be observed through sequences of connection events occurring between network nodes over time. In some of these applications, the identity and size of the network may be unknown a priori and may change over time. In this article, a model for the evolution of dynamic networks based on the Pitman-Yor process is proposed. This model explicitly admits power-laws in the number of connections on each edge, often present in real world networks, and, for careful choices of the parameters, power-laws for the degree distribution of the nodes. A novel empirical method for the estimation of the hyperparameters of the Pitman-Yor process is proposed, and some necessary corrections for uniform discrete base distributions are carefully addressed. The methodology is tested on synthetic data and in an anomaly detection study on the enterprise computer network of the Los Alamos National Laboratory, and successfully detects connections from a red-team penetration test.

Journal article

Price-Williams M, Heard N, Rubin-Delanchy P, 2019, Detecting weak dependence in computer network traffic patterns by using higher criticism, Journal of the Royal Statistical Society: Series C, Vol: 68, Pages: 641-655, ISSN: 0035-9254

To perform robust statistical anomaly detection in cybersecurity, we must build realistic models of the traffic patterns within a computer network. It is therefore important to understand the dependences between the large number of routinely interacting communication pathways within such a network. Pairs of interacting nodes in any directed communication network can be modelled as point processes where events in a process indicate information being sent between two nodes. For two processes A and B denoting the interactions between two distinct pairs of computers, called edges, we wish to assess whether events in A trigger events then to occur in B. A test is introduced to detect such dependence when only a subset of the events in A exhibit a triggering effect on process B; this test will enable us to detect even weakly correlated edges within a computer network graph. Since computer network events occur as a high frequency data stream, we consider the asymptotics of this problem as the number of events goes to ∞, while the proportion exhibiting dependence goes to 0, and examine the performance of tests that are provably consistent in this framework. An example of how this method can be used to detect genuine causal dependences is provided by using real world event data from the enterprise computer network of Los Alamos National Laboratory.

Journal article

Price-Williams M, Turcotte M, Heard N, 2018, Time of Day Anomaly Detection, Pages: 1-6

Anomaly detection systems have been shown to perform well in detecting compromised user credentials within an enterprise computer network. Most existing approaches have focused on modelling activities that users perform within the network but not the time at which users are active. This article presents an approach for identifying compromised user credentials based on modelling their time of day or diurnal patterns. Anomalous behaviour in this respect would correspond to a user working during hours that deviate from their normal historical behaviour. The methodology is demonstrated using authentication data from Los Alamos National Laboratory's enterprise computer network.

Abstract
Cite
Citations: 4

Conference paper

Heard N, Adams N, Rubin-Delanchy P, 2018, Data Science for Cyber-Security, Publisher: Wspc (Europe), ISBN: 9781786345639

This book provides insight into a range of data science techniques for addressing these pressing concerns.The application of statistical and broader data science techniques provides an exciting growth area in the design of cyber defences.

Abstract
Cite

Book

Rubin-Delanchy P, Heard NA, Lawson D, 2018, Meta-analysis of mid-p-values: some new results based on the convex order, Journal of the American Statistical Association, Vol: 114, Pages: 1105-1112, ISSN: 0162-1459

The mid-p-value is a proposed improvement on the ordinary p-value for the case where the test statistic is partially or completely discrete. In this case, the ordinary p-value is conservative, meaning that its null distribution is larger than a uniform distribution on the unit interval, in the usual stochastic order. The mid-p-value is not conservative. However, its null distribution is dominated by the uniform distribution in a different stochastic order, called the convex order. The property leads us to discover some new finite-sample and asymptotic bounds on functions of mid-p-values, which can be used to combine results from different hypothesis tests conservatively, yet more powerfully, using mid-p-values rather than p-values. Our methodology is demonstrated on real data from a cyber-security application.

Journal article

Bolton A, Heard N, 2018, Malware family discovery using reversible jump MCMC sampling of regimes, Journal of the American Statistical Association, Vol: 113, Pages: 1490-1502, ISSN: 0162-1459

Malware is computer software which has either been designed or modified with malicious intent. Hundreds of thousands of new malware threats appear on the internet each day. This is made possible through reuse of known exploits in computer systems which have not been fully eradicated; existing pieces of malware can be trivially modified and combined to create new malware which is unknown to anti-virus programs. Finding new software with similarities to known malware is therefore an important goal in cyber-security. A dynamic instruction trace of a piece of software is the sequence of machine language instructions it generates when executed. Statistical analysis of a dynamic instruction trace can help reverse engineers infer the purpose and origin of the software that generated it. Instruction traces have been successfully modeled as simple Markov chains, but empirically there are change points in the structure of the traces, with recurring regimes of transition patterns. Here, reversible jump MCMC for change point detection is extended to incorporate regime-switching, allowing regimes to be inferred from malware instruction traces. A similarity measure for malware programs based on regime matching is then used to infer the originating families, leading to compelling performance results.

Journal article

Heard N, Rubin-Delanchy P, 2018, Choosing between methods of combining p-values, Biometrika, Vol: 105, Pages: 239-246, ISSN: 0006-3444

Combining p-values from independent statistical tests is a popular approachto meta-analysis, particularly when the data underlying the tests are either nolonger available or are difficult to combine. A diverse range of p-valuecombination methods appear in the literature, each with different statisticalproperties. Yet all too often the final choice used in a meta-analysis canappear arbitrary, as if all effort has been expended building the models thatgave rise to the p-values. Birnbaum (1954) showed that any reasonable p-valuecombiner must be optimal against some alternative hypothesis. Starting fromthis perspective and recasting each method of combining p-values as alikelihood ratio test, we present theoretical results for some of the standardcombiners which provide guidance about how a powerful combiner might be chosenin practice.

Journal article

Price-Williams M, Heard N, Turcotte M, 2017, Detecting periodic subsequences in cyber security data, Pages: 84-90

Anomaly detection for cyber-security defence hasgarnered much attention in recent years providing an orthogonalapproach to traditional signature-based detection systems.Anomaly detection relies on building probability models ofnormal computer network behaviour and detecting deviationsfrom the model. Most data sets used for cyber-security havea mix of user-driven events and automated network events,which most often appears as polling behaviour. Separating theseautomated events from those caused by human activity is essentialto building good statistical models for anomaly detection. This articlepresents a changepoint detection framework for identifyingautomated network events appearing as periodic subsequences ofevent times. The opening event of each subsequence is interpretedas a human action which then generates an automated, periodicprocess. Difficulties arising from the presence of duplicate andmissing data are addressed. The methodology is demonstrated usingauthentication data from Los Alamos National Laboratory'senterprise computer network.

Abstract
Cite
Citations: 4

Conference paper

Heard NA, Turcotte MJM, 2017, Adaptive sequential Monte Carlo for multiple changepoint analysis, Journal of Computational and Graphical Statistics, Vol: 26, Pages: 414-423, ISSN: 1061-8600

Process monitoring and control requires detection of structural changes in a data stream in real time. This article introduces an efficient sequential Monte Carlo algorithm designed for learning unknown changepoints in continuous time. The method is intuitively simple: new changepoints for the latest window of data are proposed by conditioning only on data observed since the most recent estimated changepoint, as these observations carry most of the information about the current state of the process. The proposed method shows improved performance over the current state of the art. Another advantage of the proposed algorithm is that it can be made adaptive, varying the number of particles according to the apparent local complexity of the target changepoint probability distribution. This saves valuable computing time when changes in the changepoint distribution are negligible, and enables re-balancing of the importance weights of existing particles when a significant change in the target distribution is encountered. The plain and adaptive versions of the method are illustrated using the canonical continuous time changepoint problem of inferring the intensity of an inhomogeneous Poisson process, although the method is generally applicable to any changepoint problem. Performance is demonstrated using both conjugate and non-conjugate Bayesian models for the intensity. Appendices to the article are available online, illustrating the method on other models and applications.

Journal article

Rubin-Delanchy P, HEARD NA, 2016, Disassortativity of computer networks, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

Network data is ubiquitous in cyber-security applications. Accurately modelling such data allows discovery of anomalous edges, subgraphs or paths, and is key to many signature-free cyber-security analytics. We present a recurring property of graphs originating from cyber-security applications, often considered a ‘corner case’ in the main literature on network data analysis, that greatly affects the performance of standard ‘off-the-shelf’ techniques. This is the property that similarity, in terms of network behaviour, does not imply connectivity, and in fact the reverse is often true. We call this disassortivity. The phenomenon is illustrated using network flow data collected on an enterprise network, and we show how Big Data analytics designed to detect unusual connectivity patterns can be improved.

Conference paper

Turcotte M, Moore J, Heard NA, McPhall Aet al., 2016, Poisson factorization for peer-based anomaly detection, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

Anomaly detection systems are a promising tool to identify compromised user credentials and malicious insiders in enterprise networks. Most existing approaches for modelling user behaviour rely on either independent observations for each user or on pre-defined user peer groups. A method is proposed based on recommender system algorithms to learn overlapping user peer groups and to use this learned structure to detect anomalous activity. Results analysing the authentication and process-running activities of thousands of users show that the proposed method can detect compromised user accounts during a red team exercise.

Conference paper

Metelli S, Heard NA, 2016, Model-based clustering and new edge modelling in large computer networks, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE

Computer networks are complex and the analysis of their structure in search for anomalous behaviour is both a challenging and important task for cyber security. For instance, new edges, i.e. connections from a host or user to a computer that has not been connected to before, provide potentially strong statistical evidence for detecting anomalies. Unusual new edges can sometimes be indicative of both legitimate activity, such as automated update requests permitted by the client, and illegitimate activity, such as denial of service (DoS) attacks to cause service disruption or intruders escalating privileges by traversing through the host network. In both cases, capturing and accumulating evidence of anomalous new edge formation represents an important security application. Computer networks tend to exhibit an underlying cluster structure, where nodes are naturally grouped together based on similar connection patterns. What constitutes anomalous behaviour may strongly differ between clusters, so inferring these peer groups constitutes an important step in modelling the types of new connections a user would make. In this article, we present a two-step Bayesian statistical method aimed at clustering similar users inside the network and simultaneously modelling new edge activity, exploiting both overall-level and cluster-level covariates.

Conference paper

Heard NA, Palla K, Skoularidou M, 2016, Topic modelling of authentication events in an enterprise computer network, IEEE Conference on Intelligence and Security Informatics (ISI), 2016, Publisher: IEEE

The possibility for theft or misuse of legitimate user credentials is a potential cyber-security weakness in any enterprise computer network which is almost impossible to eradicate. However, by monitoring the network traffic patterns, it can be possible to detect misuse of credentials. This article presents an initial investigation into deconvolving the mixture behaviour of several individuals within a network, to see if individual users can be identified. Towards that, a technique used for document classification is deployed, the Latent Dirichlet allocation model. A pilot study is conducted on authentication events taken from real data from the enterprise network of Los Alamos National Laboratory.

Conference paper

Heard NA, Rubin-Delanchy P, 2016, Network-wide anomaly detection via the Dirichlet process, IEEE Conference on Intelligence and Security Informatics (ISI), 2016, Publisher: IEEE

Statistical anomaly detection techniques provide the next layer of cyber-security defences below traditional signature-based approaches. This article presents a scalable, principled, probability-based technique for detecting outlying connectivity behaviour within a directed interaction network such as a computer network. Independent Bayesian statistical models are fit to each message recipient in the network using the Dirichlet process, which provides a tractable, conjugate prior distribution for an unknown discrete probability distribution. The method is shown to successfully detect a red team attack in authentication data obtained from the enterprise network of Los Alamos National Laboratory.

Conference paper

Griffié J, Shannon M, Bromley CL, Boelen L, Burn GL, Williamson DJ, Heard NA, Cope AP, Owen DM, Rubin-Delanchy Pet al., 2016, A Bayesian cluster analysis method for single-molecule localization microscopy data, Nature Protocols, Vol: 11, Pages: 2499-2514, ISSN: 1754-2189

Cell function is regulated by the spatiotemporal organization of the signaling machinery, and a key facet of this is molecular clustering. Here, we present a protocol for the analysis of clustering in data generated by 2D single-molecule localization microscopy (SMLM)-for example, photoactivated localization microscopy (PALM) or stochastic optical reconstruction microscopy (STORM). Three features of such data can cause standard cluster analysis approaches to be ineffective: (i) the data take the form of a list of points rather than a pixel array; (ii) there is a non-negligible unclustered background density of points that must be accounted for; and (iii) each localization has an associated uncertainty in regard to its position. These issues are overcome using a Bayesian, model-based approach. Many possible cluster configurations are proposed and scored against a generative model, which assumes Gaussian clusters overlaid on a completely spatially random (CSR) background, before every point is scrambled by its localization precision. We present the process of generating simulated and experimental data that are suitable to our algorithm, the analysis itself, and the extraction and interpretation of key cluster descriptors such as the number of clusters, cluster radii and the number of localizations per cluster. Variations in these descriptors can be interpreted as arising from changes in the organization of the cellular nanoarchitecture. The protocol requires no specific programming ability, and the processing time for one data set, typically containing 30 regions of interest, is ∼18 h; user input takes ∼1 h.

Journal article

Turcotte MJM, Heard NA, Kent AD, 2016, Modelling user behaviour in a network using computer event logs, Dynamic Networks and Cyber-Security, Pages: 67-87, ISBN: 9781786340740

Computer event logs are a potentially valuable resource in detecting cyber security threats on a computer network. One important research problem associated with these logs is user credential theft or misuse, either by a malicious insider or an external adversary. Once compromised, a user credential can be used by an adversary to advance through the network and further their goals. Little attention is currently given to looking at computer event logs as an aggregated multivariate data stream. The aim of the work in this chapter is to model user credential patterns on the network by considering independently the time series of events generated by each user credential. Simple Bayesian models are fit to the event data for each user credential, providing a flexible global framework for monitoring credentials on an enterprise network and identifying potentially compromised credentials.

Abstract
Cite
Citations: 10

Book chapter

Professor Nick Heard

Contact

Location

Summary