(Adams, Anagnostopoulos, Bellotti, Montana, Gandy, Hand, Heard, Mortlock)

It is now a recognised fact that we are facing a data revolution in both sciences and industry, giving rise to databases of unprecedented scale (e.g., distributed databases or data streams), as well as altogether new data formats (e.g., free-form text, networks, etc.). The availability of Big Data presents an unprecedented opportunity, but also an unprecedented challenge. The Machine Learning and Big Data group are rising to this challenge by developing machine learning techniques that can handle modern data types, and draw on statistical and computational intelligence techniques to navigate vast amounts of information with minimal human supervision.

Selected Publications (in chronological order):

  • "CASOS": A subpace method for anomaly detection in high dimensional astronomical databases’ Henrion, M., Hand, D.J., Gandy, A.  and Mortlock, D.J. (2012), Statistical Analysis And Data Mining, in press
  • Anagnostopoulos, C., and Tasoulis, D., and Adams, N.M., and Hand, D.J. (2012) Online Linear and Quadratic Discriminant Analysis with adaptive forgetting for streaming classification. Statistical Analysis and Data Mining, 5(2), 139-166
  • Wang Y., Wong L. and Montana G. (2012) PaRFR: Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping, preprint
  • McWilliams B. and Montana G. (2012) Multi-view predictive partitioning in high dimensions. Statistical Analysis and Data Mining, in press
  • Anagnostopoulos, C., and Gramacy, R.B. (2012), Dynamic trees for streaming and massive data contexts, preprint, arXiv:1201.5568
  • Heard N.A., Weston D.J., Platanioti K., and Hand D.J. (2010) Bayesian anomaly detection methods for social networks. Annals of Applied Statistics, 4, 645-662.
  • Hand D.J. (2006) Classifier technology and the illusion of progress (with discussion). Statistical Science, 21, 1-34.
  • Hand D.J., Mannila H., and Smyth P. (2001) Principles of data mining, MIT Press. [Chinese translation, 2003; Polish translation, 2005]
  • Hand D.J., Blunt G., Kelly M.G., and Adams N.M. (2000) Data mining for fun and profit. Statistical Science, 15, 111-131.

Invited Talks / Keynote Presentations:

  •  Hand D.J. (2009) Modern statistics: the myth and the magic (RSS Presidential Address). Journal of the Royal Statistical Society, Series A, 172, 287-306.
  •  David Hand, “Big Data: Risks, opportunities and challenges”, DUG Conference 2012 (10/10/12, The Royal Society, London): Retail issues, big data, and research.

Upcoming Events:

  • Workshop on Big Data: bridging the gap between academia and industry. Details here.

Impact / Industrial Collaborations / Consultancy:

  • retail banking (e.g., Barclays)
  • defence and security (e.g., BAe Systems)
  • online advertising (e.g., Advance Media)

Related Grants:

  • EPSRC (2012) Enabling the statistical analysis of massive data sets on Hadoop (Montana PI)
  • EPSRC (2011-2013) Enabling high-performance statistical computing in R on hybrid GPU and multicore architectures (Montana PI)


Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

    Henrion M, Hand DJ, Gandy A, Mortlock DJet al., 2013,

    CASOS: a Subspace Method for Anomaly Detection in High Dimensional Astronomical Databases

    , STATISTICAL ANALYSIS AND DATA MINING, Vol: 6, Pages: 53-72, ISSN: 1932-1864
    Anagnostopoulos C, Tasoulis DK, Adams NM, Pavlidis NG, Hand DJet al., 2012,

    Online linear and quadratic discriminant analysis with adaptive forgetting for streaming classification

    , Statistical Analysis and Data Mining, Vol: 5, Pages: 139-166, ISSN: 1932-1864
    Mcwilliams B, Montana G, 2012,

    Multi-view predictive partitioning in high dimensions

    , Statistical Analysis and Data Mining, Vol: 5, Pages: 304-321, ISSN: 1932-1872

    Many modern data mining applications are concerned with the analysis of datasets in which the observations are described by paired high-dimensional vectorial representations or 'views'. Some typical examples can be found in web mining and genomics applications. In this article we present an algorithm for data clustering with multiple views, multi-view predictive partitioning (MVPP), which relies on a novel criterion of predictive similarity between data points. We assume that, within each cluster, the dependence between multivariate views can be modeled by using a two-block partial least squares (TB-PLS) regression model, which performs dimensionality reduction and is particularly suitable for high-dimensional settings. The proposed MVPP algorithm partitions the data such that the within-cluster predictive ability between views is maximized. The proposed objective function depends on a measure of predictive influence of points under the TB-PLS model which has been derived as an extension of the predicted residual sums of squares (PRESS) statistic commonly used in ordinary least squares regression. Using simulated data, we compare the performance of MVPP to that of competing multi-view clustering methods which rely upon geometric structures of points, but ignore the predictive relationship between the two views. State-of-art results are obtained on benchmark web mining datasets. © 2012 Wiley Periodicals, Inc.

    Montana G, Wong L, Wang Y, 2012,

    PaRFR: Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping

    Heard NA, Weston DJ, Platanioti K, Hand DJet al., 2010,


    , ANNALS OF APPLIED STATISTICS, Vol: 4, Pages: 645-662, ISSN: 1932-6157
    McDonald RA, Eckley IA, Hand DJ, 2004,

    A classifier combination tree algorithm

    , 10th International Symposium on Structural and Syntactic Pattern Recognition/5th International Conference on Statistical Techniques in Pattern Recognition, Publisher: SPRINGER-VERLAG BERLIN, Pages: 609-617, ISSN: 0302-9743
  • BOOK
    Hand DJ, Mannila H, Smyth P, 2001,

    Principles of data mining

    , Cambridge, MA, Publisher: MIT Press, ISBN: 9780262082907
    Hand DJ, Blunt G, Kelly MG, Adams NMet al., 2000,

    Data mining for fun and profit

    , STATISTICAL SCIENCE, Vol: 15, Pages: 111-126, ISSN: 0883-4237

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-t4-html.jsp Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=230&limit=30&respub-action=search.html Current Millis: 1498269701126 Current Time: Sat Jun 24 03:01:41 BST 2017