25 results found
Bolton AD, Heard NA, 2018, Malware Family Discovery Using Reversible Jump MCMC Sampling of Regimes, Journal of the American Statistical Association, Pages: 1-13, ISSN: 0162-1459
© 2018 American Statistical Association Malware is computer software that has either been designed or modified with malicious intent. Hundreds of thousands of new malware threats appear on the internet each day. This is made possible through reuse of known exploits in computer systems that have not been fully eradicated; existing pieces of malware can be trivially modified and combined to create new malware, which is unknown to anti-virus programs. Finding new software with similarities to known malware is therefore an important goal in cyber-security. A dynamic instruction trace of a piece of software is the sequence of machine language instructions it generates when executed. Statistical analysis of a dynamic instruction trace can help reverse engineers infer the purpose and origin of the software that generated it. Instruction traces have been successfully modeled as simple Markov chains, but empirically there are change points in the structure of the traces, with recurring regimes of transition patterns. Here, reversible jump Markov chain Monte Carlo for change point detection is extended to incorporate regime-switching, allowing regimes to be inferred from malware instruction traces. A similarity measure for malware programs based on regime matching is then used to infer the originating families, leading to compelling performance results.
Heard N, Rubin-Delanchy P, 2018, Choosing between methods of combining p-values, Biometrika, Vol: 105, Pages: 239-246, ISSN: 0006-3444
Combining p-values from independent statistical tests is a popular approachto meta-analysis, particularly when the data underlying the tests are either nolonger available or are difficult to combine. A diverse range of p-valuecombination methods appear in the literature, each with different statisticalproperties. Yet all too often the final choice used in a meta-analysis canappear arbitrary, as if all effort has been expended building the models thatgave rise to the p-values. Birnbaum (1954) showed that any reasonable p-valuecombiner must be optimal against some alternative hypothesis. Starting fromthis perspective and recasting each method of combining p-values as alikelihood ratio test, we present theoretical results for some of the standardcombiners which provide guidance about how a powerful combiner might be chosenin practice.
Rubin-Delanchy P, Heard NA, Lawson DJ, 2018, Meta-Analysis of Mid-p-Values: Some New Results based on the Convex Order, Journal of the American Statistical Association, ISSN: 0162-1459
© 2018, © 2018 The Authors(s). Published with license by Taylor and Francis. The mid-p-value is a proposed improvement on the ordinary p-value for the case where the test statistic is partially or completely discrete. In this case, the ordinary p-value is conservative, meaning that its null distribution is larger than a uniform distribution on the unit interval, in the usual stochastic order. The mid-p-value is not conservative. However, its null distribution is dominated by the uniform distribution in a different stochastic order, called the convex order. The property leads us to discover some new finite-sample and asymptotic bounds on functions of mid-p-values, which can be used to combine results from different hypothesis tests conservatively, yet more powerfully, using mid-p-values rather than p-values. Our methodology is demonstrated on real data from a cyber-security application.
Heard NA, Turcotte MJM, 2017, Adaptive Sequential Monte Carlo for Multiple Changepoint Analysis, JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, Vol: 26, Pages: 414-423, ISSN: 1061-8600
Griffie J, Shannon M, Bromley CL, et al., 2016, A Bayesian cluster analysis method for single-molecule localization microscopy data, NATURE PROTOCOLS, Vol: 11, Pages: 2499-2514, ISSN: 1754-2189
Heard NA, Palla K, Skoularidou M, 2016, Topic modelling of authentication events in an enterprise computer network, IEEE Conference on Intelligence and Security Informatics (ISI), 2016, Publisher: IEEE
The possibility for theft or misuse of legitimate user credentials is a potential cyber-security weakness in any enterprise computer network which is almost impossible to eradicate. However, by monitoring the network traffic patterns, it can be possible to detect misuse of credentials. This article presents an initial investigation into deconvolving the mixture behaviour of several individuals within a network, to see if individual users can be identified. Towards that, a technique used for document classification is deployed, the Latent Dirichlet allocation model. A pilot study is conducted on authentication events taken from real data from the enterprise network of Los Alamos National Laboratory.
Metelli S, Heard NA, 2016, Model-based clustering and new edge modelling in large computer networks, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE
Computer networks are complex and the analysis of their structure in search for anomalous behaviour is both a challenging and important task for cyber security. For instance, new edges, i.e. connections from a host or user to a computer that has not been connected to before, provide potentially strong statistical evidence for detecting anomalies. Unusual new edges can sometimes be indicative of both legitimate activity, such as automated update requests permitted by the client, and illegitimate activity, such as denial of service (DoS) attacks to cause service disruption or intruders escalating privileges by traversing through the host network. In both cases, capturing and accumulating evidence of anomalous new edge formation represents an important security application. Computer networks tend to exhibit an underlying cluster structure, where nodes are naturally grouped together based on similar connection patterns. What constitutes anomalous behaviour may strongly differ between clusters, so inferring these peer groups constitutes an important step in modelling the types of new connections a user would make. In this article, we present a two-step Bayesian statistical method aimed at clustering similar users inside the network and simultaneously modelling new edge activity, exploiting both overall-level and cluster-level covariates.
Turcotte M, Moore J, Heard NA, et al., 2016, Poisson factorization for peer-based anomaly detection, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE
Anomaly detection systems are a promising tool to identify compromised user credentials and malicious insiders in enterprise networks. Most existing approaches for modelling user behaviour rely on either independent observations for each user or on pre-defined user peer groups. A method is proposed based on recommender system algorithms to learn overlapping user peer groups and to use this learned structure to detect anomalous activity. Results analysing the authentication and process-running activities of thousands of users show that the proposed method can detect compromised user accounts during a red team exercise.
Heard NA, Rubin-Delanchy P, 2016, Network-wide anomaly detection via the Dirichlet process, IEEE Conference on Intelligence and Security Informatics (ISI), 2016, Publisher: IEEE
Statistical anomaly detection techniques provide the next layer of cyber-security defences below traditional signature-based approaches. This article presents a scalable, principled, probability-based technique for detecting outlying connectivity behaviour within a directed interaction network such as a computer network. Independent Bayesian statistical models are fit to each message recipient in the network using the Dirichlet process, which provides a tractable, conjugate prior distribution for an unknown discrete probability distribution. The method is shown to successfully detect a red team attack in authentication data obtained from the enterprise network of Los Alamos National Laboratory.
Heard NA, Turcotte MJM, 2016, Convergence of Monte Carlo distribution estimates from rival samplers, STATISTICS AND COMPUTING, Vol: 26, Pages: 1147-1161, ISSN: 0960-3174
Rubin-Delanchy P, Adams NM, Heard NA, 2016, Disassortativity of Computer Networks, 14th IEEE International Conference on Intelligence and Security Informatics - Cybersecurity and Big Data (IEEE ISI), Publisher: IEEE, Pages: 243-247
Rubin-Delanchy P, Burn GL, Griffie J, et al., 2015, Bayesian cluster identification in single-molecule Localization microscopy data, NATURE METHODS, Vol: 12, Pages: 1072-1076, ISSN: 1548-7091
Metelli S, Heard N, 2014, Modelling new edge formation in a computer network through Bayesian variable selection, IEEE Joint Intelligence and Security Informatics Conference 2014, Publisher: IEEE
Anomalous connections in a computer network graph can be a signal of malicious behaviours. For instance, a compromised computer node tends to form a large number of new client edges in the network graph, connecting to server IP (Internet Protocol) addresses which have not previously been visited. This behaviour can be caused by malware (malicious software) performing a denial of service (DoS) attack, to cause disruption or further spread malware, alternatively, the rapid formation of new edges by a compromised node can be caused by an intruder seeking to escalate privileges by traversing through the host network. However, study of computer network flow data suggests new edges are also regularly formed by uninfected hosts, and often in bursts. Statistically detecting anomalous formation of new edges requires reliable models of the normal rate of new edges formed by each host. Network traffic data are complex, and so the potential number of variables which might be included in such a statistical model can be large, and without proper treatment this would lead to overfitting of models with poor predictive performance. In this paper, Bayesian variable selection is applied to a logistic regression model for new edge formation for the purpose of selecting the best subset of variables to include.
Heard N, Rubin-Delanchy P, Lawson D, 2014, Filtering automated polling traffic in computer network flow data, IEEE Joint Intelligence and Security Informatics Conference (JISIC 2014), Publisher: IEEE, Pages: 268-271
Fowler A, Menon V, Heard NA, 2013, DYNAMIC BAYESIAN CLUSTERING, JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, Vol: 11, ISSN: 0219-7200
Fowler A, Heard NA, 2012, On two-way Bayesian agglomerative clustering of gene expression data, Statistical Analysis and Data Mining, Vol: 5, Pages: 463-476, ISSN: 1932-1864
Heard NA, 2011, Iterative Reclassification in Agglomerative Clustering, JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, Vol: 20, Pages: 920-936, ISSN: 1061-8600
Heard NA, Weston DJ, Platanioti K, et al., 2010, BAYESIAN ANOMALY DETECTION METHODS FOR SOCIAL NETWORKS, ANNALS OF APPLIED STATISTICS, Vol: 4, Pages: 645-662, ISSN: 1932-6157
Bushel PR, Heard NA, Gutman R, et al., 2009, Dissecting the fission yeast regulatory network reveals phase-specific control elements of its cell cycle, BMC SYSTEMS BIOLOGY, Vol: 3, ISSN: 1752-0509
Heard NA, Holmes CC, Stephens DA, 2006, A quantitative study of gene regulation involved in the immune response of anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves, JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, Vol: 101, Pages: 18-29, ISSN: 0162-1459
Heard NA, Holmes CC, Stephens DA, et al., 2005, Bayesian coclustering of Anopheles gene expression time series: Study of immune defense response to multiple experimental challenges, PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, Vol: 102, Pages: 16939-16944, ISSN: 0027-8424
Hand DJ, Heard NA, 2005, Finding groups in gene expression data, JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, Pages: 215-225, ISSN: 1110-7243
Hand DJ, Adams NM, Heard NA, 2005, Pattern discovery tools for detecting cheating in student coursework, Berlin, Local pattern detection. International seminar. Dagstuhl Castle, Germany, 12 - 16 April 2004, Publisher: Springer-Verlag, Pages: 39-52
Heard NA, 2004, Technology in genetics - Automating the scientific process, HEREDITY, Vol: 93, Pages: 6-7, ISSN: 0018-067X
Holmes CC, Heard NA, 2003, Generalized monotonic regression using random change points, STATISTICS IN MEDICINE, Vol: 22, Pages: 623-638, ISSN: 0277-6715
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.