282 results found
Hand DJ, Hand DJ, Hand DJ, 2004, Strength in diversity: the advance of data analysis, Berlin, Knowledge discovery in databases: PKDD 2004: 8th European conference on principles and practice of knowledge discovery in databases, Pisa, Italy, 20 - 24 September 2004, Publisher: Springer-Verlag, Pages: 18-26, ISSN: 0302-9743
The scientific analysis of data is only around a century old. For most of that century, data analysis was the realm of only one discipline - statistics. As a consequence of the development of the computer, things have changed dramatically and now there are several such disciplines, including machine learning, pattern recognition, and data mining. This paper looks at some of the similarities and some of the differences between these disciplines, noting where they intersect and, perhaps of more interest, where they do not. Particular issues examined include the nature of the data with which they are concerned, the role of mathematics, differences in the objectives, how the different areas of application have led to different aims, and how the different disciplines have led sometimes to the same analytic tools being developed, but also sometimes to different tools being developed. Some conjectures about likely future developments are given.
Hand DJ, Hand DJ, Hand DJ, 2004, Academic obsessions and classification realities: ignoring practicalities in supervised classification, Berlin, Meeting of the Interantional-Federation-of-Classifications-Societies (IFCS), Illinois Institute of Technology, Chicago, IL, Publisher: Springer-Verlag, Pages: 209-232, ISSN: 1431-8814
Supervised classification methods have been the focus of a vast amount of research in recent decades, within a variety of intellectual disciplines; including statistics, machine learning, pattern recognition, and data mining. Highly sophisticated methods have been developed, using the full power of recent advances in computation. Many of these methods would have been simply inconceivable to earlier generations. However, most of these advances have largely taken place within the context of the classical supervised classification paradigm of data analysis. That is, a classification rule is constructed based on a given 'design sample' of data, with known and well-defined classes, and this rule is then used to classify future objects. This paper argues that this paradigm is often, perhaps typically, an over-idealisation of the practical realities of supervised classification problems. Furthermore, it is also argued that the sequential nature of the statistical modelling process means that the large gains in predictive accuracy are achieved early in the modelling process. Putting these two facts together leads to the suspicion that the apparent superiority of the highly sophisticated methods is often illusory: simple methods are often equally effective or even superior in classifying new data points.
Hand, David, 2004, Pattern discovery (Preface), Journal of Applied Statistics, Vol: 31, Pages: 883-884, ISSN: 0266-4763
McDonald RA, Eckley IA, Hand DJ, et al., 2004, A classifier combination tree algorithm, Berlin, 10th international workshop on structural and syntactic pattern recognition / 5th international conference on statistical techniques in pattern recognition, Lisbon, Portugal, Publisher: Springer-Verlag Berlin, Pages: 609-617, ISSN: 0302-9743
In recent years a number of authors have suggested that combining classifiers within local regions of the measurement space might yield superior classification performance to rigid global weighting schemes. In this paper we describe a modified version of the CART algorithm, called ARPACC, that performs local classifier combination. One obstacle to such combination is the fact that the 'optimal' covariance combination results originally assumed only two classes and classifier unbiasedness. In this paper we adopt an approach based on minimizing the Brier score and introduce a generalized matrix inverse solution for use in cases where the error matrix is singular. We also report some preliminary experimental results on simulated data.
Mcdonald RA, Hand DJ, Eckley IA, et al., 2004, A multiclass extension to the Brownboost algorithm, INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, Vol: 18, Pages: 905-931, ISSN: 0218-0014
Bolton RJ, Hand DJ, Webb AR, 2003, Projection techniques for nonlinear principal component analysis, STATISTICS AND COMPUTING, Vol: 13, Pages: 267-276, ISSN: 0960-3174
Hand DJ, 2003, Statistics and the Theory of Measurement
Just as there are different interpretations of probability, leading to different kinds of inferential statements and different conclusions about statistical models and questions, so there are different theories of measurement, which in turn may lead to different kinds of statistical model and possibly different conclusions. This has led to much confusion and a long running debate about when different classes of statistical methods may legitimately be applied. This paper outlines the major theories of measurement and their relationships and describes the different kinds of models and hypotheses which may be formulated within each theory. One general conclusion is that the domains of applicability of the two major theories are typically different, and it is this which helps apparent contradictions to be avoided in most practical applications.
Hand DJ, Henley WE, 2003, Statistical Classification Methods in Consumer Credit Scoring: A Review
Credit scoring is the term used to describe formal statistical methods used for classifying applicants for credit into "good" and "bad" risk classes. Such methods have become increasingly important with the dramatic growth in consumer credit in recent years. A wide range of statistical methods has been applied, though the literature available to the public is limited for reasons of commercial confidentiality. Particular problems arising in the credit scoring context are examined and the statistical methods which have been applied are reviewed.
Hand DJ, Vinciotti V, Hand DJ, et al., 2003, Choosing k for two-class nearest neighbour classifiers with unbalanced classes, PATTERN RECOGNITION LETTERS, Vol: 24, Pages: 1555-1562, ISSN: 0167-8655
Hand DJ, Vinciotti V, Hand DJ, et al., 2003, Local versus global models for classification problems: Fitting models where it matters, AMERICAN STATISTICIAN, Vol: 57, Pages: 124-131, ISSN: 0003-1305
Hand DJ, 2003, Pattern discovery in data mining, Roma, Analisi statistica multivariata per le sceinze economico-sociali, le science naturali e la tecnologia, Publisher: Societ Italialia di Statistica, Pages: 15-26
Hand DJ, 2003, Choosing the right 'optimal' model in supervised classification, Literacia e Estaistica, Actas do X Congresso Anual da Socieda Portugesa de Estatstica, Porto, 25 - 28 September 2002, Pages: 31-41
Hand DJ, 2003, Individual freedom and the choice of umbrella, Kingston, Ontario, Statistics, science and public policy VII : environment, health and globalization : proceedings of the conference on statistics, science and public policy held at Herstmonceux Castle, Hailsham, U.K., 17 - 20 April 2002, Publisher: Queen's University, Pages: 213-218
Hand DJ, 2003, Selling mackerel by the pound, Kingston, Ontario, Statistics, science and public policy VII : environment, health and globalization : proceedings of the conference on statistics, science and public policy held at Herstmonceux Castle, Hailsham, U.K., 17 - 20 April 2002, Publisher: Queen's University, Pages: 67-72
King MD, Crowder MJ, Hand DJ, et al., 2003, Temporal relation between the ADC and DC potential responses to transient focal ischemia in the rat: A Markov chain Monte Carlo simulation analysis, JOURNAL OF CEREBRAL BLOOD FLOW AND METABOLISM, Vol: 23, Pages: 677-688, ISSN: 0271-678X
Markov chain Monte Carlo simulation was used in a reanalysis of the longitudinal data obtained by Harris et al. (J Cereb Blood Flow Metab 20:28-36) in a study of the direct current (DC) potential and apparent diffusion coefficient (ADC) responses to focal ischemia. The main purpose was to provide a formal analysis of the temporal relationship between the ADC and DC responses, to explore the possible involvement of a common latent (driving) process. A Bayesian nonlinear hierarchical random coefficients model was adopted. DC and ADC transition parameter posterior probability distributions were generated using three parallel Markov chains created using the Metropolis algorithm. Particular attention was paid to the within-subject differences between the DC and ADC time course characteristics. The results show that the DC response is biphasic, whereas the ADC exhibits monophasic behavior, and that the two DC components are each distinguishable from the ADC response in their time dependencies. The DC and ADC changes are not, therefore, driven by a common latent process. This work demonstrates a general analytical approach to the multivariate, longitudinal data-processing problem that commonly arises in stroke and other biomedical research.
King M, Crowder MJ, Hand DJ, et al., 2003, Is there an ADC threshold for depolarisation? An MCMC analysis (Available on CD-ROM), Berkley, CA, 11th annual meeting of the International Society for Magnetic Resonance in Medicine, Toronto, ON, Canada, 10 - 16 July 2003, Publisher: International Society for Magnetic Resonance in Medicine, Pages: 1944-1944
McDonald RA, Hand DJ, Eckley IA, et al., 2003, An empirical comparison of three boosting algorithms on real data sets with artificial class noise, Berlin, Multiple classifier systems, 4th international workshop, MCS 2003, Guildford, UK, 11 -13 June 2003: proceedings, Publisher: Springer, Pages: 35-44, ISSN: 0302-9743
Boosting algorithms are a means of building a strong ensemble, classifier by aggregating a sequence of weak. hypotheses. In this paper we consider three of the best-known boosting algorithms: Adaboost , Logitboost  and Brownboost . These algorithms are adaptive, and work by maintaining a set of example and class weights which focus the attention of a base learner on the examples that are hardest to classify. We conduct an empirical study to compare the performance of these algorithms, measured in terms, of overall test error rate, on five real data sets. The tests consist of a series of cross-validatory samples. At each validation, we set aside one third of the data chosen at random as a test set, and fit the boosting algorithm to the remaining two thirds, using binary stumps as a base learner. At each stage we record the final training and test error rates, and report the average,errors within a 95% confidence interval. We then add artificial class, noise to our data sets by randomly reassigning 20% of class labels, and repeat our experiment. We find that Brownboost and Logitboost prove less likely than Adaboost to overfit in this circumstance.
Yearling D, Hand DJ, 2003, A Bayesian network datamining approach for modelling the physical condition of copper access networks, BT TECHNOLOGY JOURNAL, Vol: 21, Pages: 90-100, ISSN: 1358-3948
Benton TC, Hand DJ, 2002, Segmentation into predictable classes, IMA Journal of Management Mathematics, Vol: 13, Pages: 245-260, ISSN: 1471-678X
Bolton RJ, Hand DJ, 2002, Statistical fraud detection: A review, STATISTICAL SCIENCE, Vol: 17, Pages: 235-249, ISSN: 0883-4237
Bolton RJ, Hand DJ, Adams NM, et al., 2002, Determining hit rate in pattern search, New York, Pattern detection and discovery, ESF exploratory workshop, London, UK, 16 - 19 September, 2002, Publisher: Springer, Pages: 36-48
This paper reviews recent ideas in Bayesian classification modelling via partitioning. These methods provide predictive estimates for class assignments using averages of a sample of models generated from the posterior distribution of the model parameters. We discuss modifications to the basic approach more suitable for problems when there are many predictor variables and/or a large training smple. (C) 2002 Elsevier Science B.V. All rights reserved.
Fayers PM, Hand DJ, Fayers PM, et al., 2002, Causal variables, indicator variables and measurement scales: an example from quality of life, JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, Vol: 165, Pages: 233-253, ISSN: 0964-1998
Fayers PM, Hand DJ, Fayers PM, et al., 2002, Causal variables, indicator variables and measurement scales: an example from quality of life (with discussion), Journal of the Royal Statistical Society Series A - Statistics in Society, Vol: 165, Pages: 233-262, ISSN: 0964-1998
Hand DJ, 2002, Pattern detection and discovery, New York, Pattern detection and discovery: ESF Exploratory Workshop, London, UK, 16 - 19 September 2002, Publisher: Springer, Pages: 1-12
Hand DJ, 2002, Discussion and conclusions, Kingston, Ontario, Statistics, science and public policy VI : science and responsibility : proceedings of the conference on statistics, science and public policy held at Herstmonceux Castle, Hailsham, U.K., 18 -21 April 2001, Publisher: Queen's University, Pages: 267-271
Hand DJ, 2002, A discussion of cultures: two or three, Kingston, Ontario, Statistics, science and public policy VI : science and responsibility : proceedings of the conference on statistics, science and public policy held at Herstmonceux Castle, Hailsham, UK., 18 -21 April 2001, Publisher: Queen's University, Pages: 81-83
Hand DJ, 2002, Artificial intelligence, Encyclopedia of environmetrics, Editors: el-Shaarawi, Piegorsch, Chichester, Publisher: Wiley, Pages: 1-6, ISBN: 9780471899976
Hand DJ, 2002, Artificial neural networks, Encyclopedia of environmetrics, Editors: el-Shaarawi, Piegorsch, Chichester, Publisher: Wiley, Pages: 1-7, ISBN: 9780471899976
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.