282 results found
Juszczak P, Adams NM, Hand DJ, et al., 2008, Off-the-peg and bespoke classifiers for fraud detection, COMPUTATIONAL STATISTICS & DATA ANALYSIS, Vol: 52, Pages: 4521-4532, ISSN: 0167-9473
Detecting fraudulent plastic card transactions is an important and challenging problem. The challenges arise from a number of factors including the sheer volume of transactions financial institutions have to process, the asynchronous and heterogeneous nature of transactions, and the adaptive behaviour of fraudsters. In this fraud detection problem the performance of a supervised two-class classification approach is compared with performance of an unsupervised one-class classification approach. Attention is focussed primarily on one-class classification approaches. Useful representations of transaction records, and ways of combining different one-class classifiers are described. Assessment of performance for such problems is complicated by the need for timely decision making. Performance assessment measures are discussed, and the performance of a number of one- and two-class classification methods is assessed using two large, real world personal banking data sets.
Pavlidis NG, Tasoulis DK, Hand DJ, 2008, Simulation studies of Multi-Armed Bandits with Covariates (Invited Paper), UKSim 10th International Conference on Computer Modelling and Simulation (EUROSIM/UKSim), Publisher: IEEE COMPUTER SOC, Pages: 493-498
Tasoulis DK, Adams NM, Hand DJ, 2008, Simulation and Analysis of Delay Handling Mechanisms in Sensor Networks, UKSim 10th International Conference on Computer Modelling and Simulation (EUROSIM/UKSim), Publisher: IEEE COMPUTER SOC, Pages: 661-666
Twala BETH, Jones MC, Hand DJ, et al., 2008, Good methods for coping with missing data in decision trees, PATTERN RECOGNITION LETTERS, Vol: 29, Pages: 950-956, ISSN: 0167-8655
Weston DJ, Hand DJ, Adams NM, et al., 2008, Plastic card fraud detection using peer group analysis, ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, Vol: 2, Pages: 45-62, ISSN: 1862-5347
This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, k NN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development. © Springer-Verlag London Limited 2007.
Crowder M, Hand DJ, Krzanowski W, et al., 2007, On optimal intervention for customer lifetime value, EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, Vol: 183, Pages: 1550-1559, ISSN: 0377-2217
Hand DJ, 2007, On Briggs and Zaretzki: The Skill Plot: A Graphical Technique for Evaluating Continuous Diagnostic Tests., Biometrics
Hand DJ, 2007, The Nature of Statistical Evidence by Bill Thompson, International Statistical Review, Vol: 75, Pages: 254-255
Hand DJ, 2007, Finite Mixture and Markov Switching Models by Sylvia Frühwirth-Schnatter, International Statistical Review, Vol: 75, Pages: 255-255
Hand DJ, 2007, Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner by Galit Shmueli, Nitin R. Patel, Peter C. Bruce, International Statistical Review, Vol: 75, Pages: 256-256
Hand DJ, 2007, The Risks of Financial Institutions edited by Mark Carey, Rene M. Stulz, International Statistical Review, Vol: 75, Pages: 266-267
Hand DJ, 2007, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose, International Statistical Review, Vol: 75, Pages: 409-409
Hand DJ, 2007, Dynamic Data Assimilation: A Least Squares Approach by John M. Lewis, S. Lakshmivarahan, Sudarshan Dhall, International Statistical Review, Vol: 75, Pages: 410-410
Hand DJ, 2007, Statistical Development of Quality in Medicine by Per Winkel, Nien Fan Zhang, International Statistical Review, Vol: 75, Pages: 417-417
Hand DJ, 2007, Matrix Methods in Data Mining and Pattern Recognition by Lars Eldén, International Statistical Review, Vol: 75, Pages: 418-418
Data mining is the discovery of interesting, unexpected or valuable structures in large datasets. As such, it has two rather different aspects. One of these concerns large-scale, 'global' structures, and the aim is to model the shapes, or features of the shapes, of distributions. The other concerns small-scale, 'local' structures, and the aim is to detect these anomalies and decide if they are real or chance occurrences. In the context of signal detection in the pharmaceutical sector, most interest lies in the second of the above two aspects; however, signal detection occurs relative to an assumed background model, therefore, some discussion of the first aspect is also necessary. This paper gives a lightning overview of data mining and its relation to statistics, with particular emphasis on tools for the detection of adverse drug reactions.
Hutton JL, Collett D, McNicol J, et al., 2007, Discussion on the paper by Taplin, JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, Vol: 170, Pages: 290-300, ISSN: 0964-1998
Krzanowski WJ, Hand DJ, 2007, A recursive partitioning tool for interval prediction, ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, Vol: 1, Pages: 241-254, ISSN: 1862-5347
Tasoulis DK, Adams NM, Hand DJ, et al., 2007, Should delayed measurements always be incorporated in filtering?, 15th International Conference on Digital Signal Processing, Publisher: IEEE, Pages: 264-+
We consider situation,, in which the measurements an agent expects to receive are prone to delay. A number of procedure,, have been proposed to handle such delays. These procedures provide extended frameworks that incorporate delayed measurement once available. In a simple context, we explore the performance of these algorithms, and find that we can reduce the computational demands of these methods while retaining good performance.
Wu I-D, Hand DJ, Wu I-D, et al., 2007, Handling selection bias when choosing actions in retail credit applications, EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, Vol: 183, Pages: 1560-1568, ISSN: 0377-2217
Hand DJ, Hand DJ, Hand DJ, 2006, Classifier technology and the illusion of progress, STATISTICAL SCIENCE, Vol: 21, Pages: 1-14, ISSN: 0883-4237
A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.
Hand DJ, Hand DJ, Hand DJ, 2006, Protection or privacy? Data mining and personal data, Berlin, Advances in Knowledge Discovery and Data mining: 10th Pacific-Asia Conference (PAKDD 2006), 9 - 12 April 2006, Singapore, Publisher: Springer, Pages: 1-10, ISSN: 0302-9743
In order to run countries and economies effectively, governments and governmental institutions need to collect and analyse vast amounts of personal data. Similarly, health service providers, security services, transport planners, and education authorities need to know a great deal about their clients. And, of course, commercial operations run more efficiently and can meet the needs of their customers more effectively the more they know about them. In general then, the more data these organisation have, the better. On the other hand, the more private data which is collated and disseminated, the more individuals are at risk of crimes such as identity theft and financial fraud, not to mention the simple invasion of privacy that such data collection represents. Most work in data mining has concentrated on the positive aspects of extracting useful information from large data sets, But as the technology and its use advances so more awareness of the potential downside is needed. In this paper I look at some of these issues. I examine how data mining tools and techniques are being used by governments and commercial operations to gain insight into individual behaviour. And I look at the concerns that such advances are bringing.
Tasoulis DK, Adams NM, Hand DJ, et al., 2006, Unsupervised clustering in streaming data, 6th IEEE International Conference on Data Mining, Publisher: IEEE COMPUTER SOC, Pages: 638-+
Tools for automatically clustering streaming data are becoming increasingly important as data acquisition technology continues to advance. In this paper we present an extension of conventional kernel density clustering to a spatio-temporal setting, and also develop a novel algorithmic scheme for clustering data streams. Experimental results demonstrate both the high efficiency and other benefits of this new approach.
Zhang Z, Hand DJ, Zhang ZC, et al., 2006, Detecting groups of anomalously similar objects in large data sets, 6th International Symposium on Intelligent Data Analysis, Publisher: IOS PRESS, Pages: 473-483, ISSN: 1088-467X
Pattern discovery is a facet of data mining concerned with the detection of "small local" structures in large data sets. In high dimensions this is typically difficult because of the computational work involved in searching over the data space. In this paper we outline a tool called PEAKER which can detect patterns efficiently in high dimensions. We approach the subject through the two aspects of pattern discovery, detection and verification. We demonstrate various ways of using PEAKER as well as its various inherent properties, emphasizing the exploratory nature of the tool.
Cox DR, Hand DJ, Herzberg AM, 2005, Selected statistical papers of Sir David Cox: design of investigations, statistical methods and applications (Vol.1), Cambridge, Publisher: Cambridge University Press, ISBN: 9780521858168
Cox DR, Hand DJ, Herzberg AM, 2005, Selected statistical papers of Sir David Cox: foundations of statistical inference, theoretical statistics, time series and stochastic processes (Vol.2), Cambridge, Publisher: Cambridge University Press, ISBN: 9780521849401
Crowder M, Hand DJ, Crowder M, et al., 2005, On loss distributions from installment-repaid loans, LIFETIME DATA ANALYSIS, Vol: 11, Pages: 545-564, ISSN: 1380-7870
The banks have been accumulating huge data bases for many years and are increasingly turning to statistics to provide insight into customer behaviour, among other things. Credit risk is an important issue and certain stochastic models have been developed in recent years to describe and predict loan default. Two of the major models currently used in the industry are considered here, and various ways of extending their application to the case where a loan is repaid in installments are explored. The aspect of interest is the probability distribution of the total loss due to repayment default at some time. Thus, the loss distribution is determined by the distribution of times to default, here regarded as a discrete-time survival distribution. In particular, the probabilities of large losses are to be assessed for insurance purposes.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.