282 results found
Hand DJ, Hand DJ, 2005, Supervised classification and tunnel vision, APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, Vol: 21, Pages: 97-109, ISSN: 1524-1904
Hand DJ, Hand DJ, Hand DJ, 2005, Good practice in retail credit scorecard assessment, Credit Rating and Scoring Models Conference, Publisher: PALGRAVE MACMILLAN LTD, Pages: 1109-1117, ISSN: 0160-5682
In retail banking, predictive statistical models called 'scorecards' are used to assign customers to classes, and hence to appropriate actions or interventions. Such assignments are made on the basis of whether a customer's predicted score is above or below a given threshold. The predictive power of such scorecards gradually deteriorates over time, so that performance needs to be monitored. Common performance measures used in the retail banking sector include the Gini coefficient, the Kolmogorov - Smirnov statistic, the mean difference, and the information value. However, all of these measures use irrelevant information about the magnitude of scores, and fail to use crucial information relating to numbers misclassified. The result is that such measures can sometimes be seriously misleading, resulting in poor quality decisions being made, and mistaken actions being taken. The weaknesses of these measures are illustrated. Performance measures not subject to these risks are defined, and simple numerical illustrations are given.
The vast potential of the genomic insight offered by microarray technologies has led to their widespread use since they were introduced a decade ago. Application areas include gene function discovery, disease diagnosis, and inferring regulatory networks. Microarray experiments enable large-scale, high-throughput investigations of gene activity and have thus provided the data analyst with a distinctive, high-dimensional field of study. Many questions in this field relate to finding subgroups of data profiles which are very similar. A popular type of exploratory tool for finding subgroups is cluster analysis, and many different flavors of algorithms have been used and indeed tailored for microarray data. Cluster analysis, however, implies a partitioning of the entire data set, and this does not always match the objective. Sometimes pattern discovery or bump hunting tools are more appropriate. This paper reviews these various tools for finding interesting subgroups.
Hand D, 2005, Data analysis in personal financial services: a rich opportunity, Significance, Vol: 2, Pages: 110-113, ISSN: 1740-9705
Hand DJ, 2005, Grade inflation, Kingston, Ontario, Statistics, science and public policy IX. Government, science and politics. Proceedings of the conference on statistics, science and public policy held at Herstmonceux Castle, Hailsham, UK, 21 - 24 April 2004, Publisher: Queen's University, Pages: 115-124
Hand DJ, 2005, Modern data analysis tools in personal financial services: a quantitative revolution?, N/A, Data mining et apprentissage statistique applications en assurance, Niort, France, 12 - 13 May 2005, Publisher: N/A, Pages: 1-8
Hand DJ, 2005, Data mining, Encyclopedia of statistics in behavioral science, Editors: Everitt, Howell, Publisher: Wiley, Pages: 461-465, ISBN: 9780470860809
Hand DJ, 2005, Pattern recognition, Handbook of statistics, Editors: Rao, Wegman, Amsterdam, Publisher: Elsevier, Pages: 213-228, ISBN: 9780444511416
Hand DJ, Adams NM, Heard NA, et al., 2005, Pattern discovery tools for detecting cheating in student coursework, Berlin, Local pattern detection. International seminar. Dagstuhl Castle, Germany, 12 - 16 April 2004, Publisher: Springer-Verlag, Pages: 39-52, ISSN: 0302-9743
Students sometimes cheat. In particular, they sometimes copy coursework assignments from each other. Such copying is occasionally detected by the markers, since the copied script and the original will be unusually similar. However, one cannot rely on such subjective assessment - perhaps there axe many scripts or perhaps the student has sought to disguise the copying by changing words or other aspects of the answers. We describe an attempt to develop a pattern discovery method for detecting cheating, based on measures of the similarities between scripts, where similarity is defined in syntactic rather than semantic terms. This problem differs from many other pattern discovery problems because the peaks will typically be very low: normally only one or two cheating students will copy from any given other student.
Hand DJ, Hand DJ, 2005, Size matters-how measurement defines our world, Significance, Vol: 2, Pages: 81-83, ISSN: 1740-9705
Hand, David J, Krzanowski, et al., 2005, Optimising k-means clustering results with standard software packages, Computational Statistics & Data Analysis, Vol: 49, Pages: 969-973, ISSN: 0167-9473
Heard NA, Holmes CC, Stephens DA, et al., 2005, Bayesian coclustering of Anopheles gene expression time series: Study of immune defense response to multiple experimental challenges, PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, Vol: 102, Pages: 16939-16944, ISSN: 0027-8424
We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint probability model, characterizing gene coregulation between multiple experiments. We compute the model using a two-stage Expectation-Maximization-type algorithm, first fixing the cross-experiment covariance structure and using efficient Bayesian hierarchical clustering to obtain a locally optimal clustering of the gene expression profiles and then, conditional on that clustering, carrying out Bayesian inference on the cross-experiment covariance using Markov chain Monte Carlo simulation to obtain an expectation. For the problem of model choice, we use a cross-validatory approach to decide between individual experiment modeling and varying levels of coclustering. Our method successfully generates tightly coregulated clusters of genes that are implicated in related processes and therefore can be used for analysis of global transcript responses to various stimuli and prediction of gene functions.
Jamain A, Hand DJ, Jamain A, et al., 2005, The Naive Bayes Mystery: a classification detective story, PATTERN RECOGNITION LETTERS, Vol: 26, Pages: 1752-1760, ISSN: 0167-8655
King MD, Crowder MJ, Hand DJ, et al., 2005, Is anoxic depolarisation associated with an ADC threshold? A Markov chain Monte Carlo analysis, NMR IN BIOMEDICINE, Vol: 18, Pages: 587-594, ISSN: 0952-3480
A Bayesian nonlinear hierarchical random coefficients model was used in a reanalysis of a previously published longitudinal study of the extracellular direct current (DC)-potential and apparent diffusion coefficient (ADC) responses to focal ischaemia. The main purpose was to examine the data for evidence of an ADC threshold for anoxic depolarisation. A Markov chain Monte Carlo simulation approach was adopted. The Metropolis algorithm was used to generate three parallel Markov chains and thus obtain a sampled posterior probability distribution for each of the DC-potential and ADC model parameters, together with a number of derived parameters. The latter were used in a subsequent threshold analysis. The analysis provided no evidence indicating a consistent and reproducible ADC threshold for anoxic depolarisation.
Thomas LC, Oliver RW, Hand DJ, et al., 2005, A survey of the issues in consumer credit modelling research, JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, Vol: 56, Pages: 1006-1015, ISSN: 0160-5682
Zhang Z, Hand DJ, Zhang ZC, et al., 2005, Detecting groups of anomalously similar objects in large data sets, Berlin, Advances in Intelligent Data Analysis; IDA 2005, 6th International symposium; Madrid, Publisher: Springer, Pages: 509-519, ISSN: 0302-9743
Pattern discovery is a facet of data mining concerned with the detection of "small local" structures in large data sets. In high dimensions this is typically difficult because of the computational work involved in searching over the data space. In this paper we outline a tool called PEAKER which can detect patterns efficiently in high dimensions. We approach the subject through the two aspects of pattern discovery, detection and verification. We demonstrate various ways of using PEAKER as well as its various inherent properties, emphasizing the exploratory nature of the tool.
Adams NM, Crowder MJ, Hand DJ, et al., 2004, Methods and models in statistics: in honour of Professor John Nelder, FRS, London, Publisher: Imperial College Press, ISBN: 9781860944635
Bolton RJ, Hand DJ, Crowder M, et al., 2004, Significance tests for unsupervised pattern discovery in large continuous multivariate data sets, COMPUTATIONAL STATISTICS & DATA ANALYSIS, Vol: 46, Pages: 57-79, ISSN: 0167-9473
Hand DJ, 2004, Deconstructing Statistical Questions
Too much current statistical work takes a superficial view of the client's research question, adopting techniques which have a solid history, a sound mathematical basis or readily available software, but without considering in depth whether the questions being answered are in fact those which should be asked. Examples, some familiar and others less so, are given to illustrate this assertion. It is clear that establishing the mapping from the client's domain to a statistical question is one of the most difficult parts of a statistical analysis. It is a part in which the responsibility is shared by both client and statistician. A plea is made for more research effort to go in this direction and some suggestions are made for ways to tackle the problem.
Hand DJ, Bolton RJ, Hand DJ, et al., 2004, Pattern discovery and detection: A unified statistical methodology, JOURNAL OF APPLIED STATISTICS, Vol: 31, Pages: 885-924, ISSN: 0266-4763
Hand DJ, Glasbey C, Husmeier D, et al., 2004, Clustering objects on subsets of attributes - Discussion, JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, Vol: 66, Pages: 839-849, ISSN: 1369-7412
Hand DJ, Hand DJ, Hand DJ, 2004, Strength in diversity: The advance of data analysis, Berlin, 15th European Conference on Machine Learning/8th European Conference on Principles and Practice of Knowledge Discovery in Databases, Publisher: SPRINGER-VERLAG BERLIN, Pages: 18-26, ISSN: 0302-9743
The scientific analysis of data is only around a century old. For most of that century, data analysis was the realm of only one discipline - statistics. As a consequence of the development of the computer, things have changed dramatically and now there are several such disciplines, including machine learning, pattern recognition, and data mining. This paper looks at some of the similarities and some of the differences between these disciplines, noting where they intersect and, perhaps of more interest, where they do not. Particular issues examined include the nature of the data with which they are concerned, the role of mathematics, differences in the objectives, how the different areas of application have led to different aims, and how the different disciplines have led sometimes to the same analytic tools being developed, but also sometimes to different tools being developed. Some conjectures about likely future developments are given.
Hand DJ, 2004, Propose vote of thanks for 'Clustering objects on subsets of attributes' by J.H.Friedman and J.J.Meulman (Review), Journal of the Royal Statistical Society Series B - Statistical Methodology, Vol: 66, Pages: 839-840, ISSN: 1369-7412
Hand DJ, 2004, Credit scoring, Encyclopedia of actuarial science, Editors: Teugels, Sundt, Chichester, Publisher: Wiley, Pages: 2-14, ISBN: 9780470846766
Hand DJ, 2004, Crime, statistics, and behaviour, Kingston, Ontario, Statistics, science and public policy VIII : science, ethics and the law: proceedings of the conference on statistics, science and public policy held at Herstmonceux Castle, Hailsham, U.K., 23 - 26 April 2003, Publisher: Queen's University, Pages: 181-187
Hand DJ, 2004, Measurement theory and practice : the world through quantification, London, Publisher: Arnold, ISBN: 9780340677834
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.