Big Data and Statistical Machine Learning
It is now a recognised fact that we are facing a data revolution in both sciences and industry, giving rise to databases of unprecedented scale (e.g., distributed databases or data streams), as well as altogether new data formats (e.g., free-form text, networks, etc.). The availability of Big Data presents an unprecedented opportunity, but also an unprecedented challenge. The Machine Learning and Big Data group are rising to this challenge by developing machine learning techniques that can handle modern data types, and draw on statistical and computational intelligence techniques to navigate vast amounts of information with minimal human supervision.
Selected Publications (in chronological order):
- "CASOS": A subpace method for anomaly detection in high dimensional astronomical databases’ Henrion, M., Hand, D.J., Gandy, A. and Mortlock, D.J. (2012), Statistical Analysis And Data Mining, in press
- Anagnostopoulos, C., and Tasoulis, D., and Adams, N.M., and Hand, D.J. (2012) Online Linear and Quadratic Discriminant Analysis with adaptive forgetting for streaming classification. Statistical Analysis and Data Mining, 5(2), 139-166
- Wang Y., Wong L. and Montana G. (2012) PaRFR: Parallel Random Forest Regression on Hadoop for Multivariate Quantitative Trait Loci Mapping, preprint
- McWilliams B. and Montana G. (2012) Multi-view predictive partitioning in high dimensions. Statistical Analysis and Data Mining, in press
- Anagnostopoulos, C., and Gramacy, R.B. (2012), Dynamic trees for streaming and massive data contexts, preprint, arXiv:1201.5568
- Heard N.A., Weston D.J., Platanioti K., and Hand D.J. (2010) Bayesian anomaly detection methods for social networks. Annals of Applied Statistics, 4, 645-662.
- Hand D.J. (2006) Classifier technology and the illusion of progress (with discussion). Statistical Science, 21, 1-34.
- Hand D.J., Mannila H., and Smyth P. (2001) Principles of data mining, MIT Press. [Chinese translation, 2003; Polish translation, 2005]
- Hand D.J., Blunt G., Kelly M.G., and Adams N.M. (2000) Data mining for fun and profit. Statistical Science, 15, 111-131.
Invited Talks / Keynote Presentations:
- Hand D.J. (2009) Modern statistics: the myth and the magic (RSS Presidential Address). Journal of the Royal Statistical Society, Series A, 172, 287-306.
- David Hand, “Big Data: Risks, opportunities and challenges”, DUG Conference 2012 (10/10/12, The Royal Society, London): Retail issues, big data, and research.
- Workshop on Big Data: bridging the gap between academia and industry. Details here.
Impact / Industrial Collaborations / Consultancy:
- retail banking (e.g., Barclays)
- defence and security (e.g., BAe Systems)
- online advertising (e.g., Advance Media)
- EPSRC (2012) Enabling the statistical analysis of massive data sets on Hadoop (Montana PI)
- EPSRC (2011-2013) Enabling high-performance statistical computing in R on hybrid GPU and multicore architectures (Montana PI)