292 results found
Hand D, What is the purpose of statistical modelling?, Harvard Data Science Review
Hand DJ, 2018, Aspects of data ethics in a changing world: Where are we now?, Big Data, Vol: 6, Pages: 176-190, ISSN: 2167-6461
Ready data availability, cheap storage capacity, and powerful tools for extracting information from data have the potential to significantly enhance the human condition. However, as with all advanced technologies, this comes with the potential for misuse. Ethical oversight and constraints are needed to ensure that an appropriate balance is reached. Ethical issues involving data may be more challenging than the ethical challenges of some other advanced technologies partly because data and data science are ubiquitous, having the potential to impact all aspects of life, and partly because of their intrinsic complexity. We explore the nature of data, personal data, data ownership, consent and purpose of use, trustworthiness of data as well as of algorithms and of those using the data, and matters of privacy and confidentiality. A checklist is given of topics that need to be considered.
Hand DJ, Who told you that? Data provenance, false facts, and how to tell the liars from the truth-tellers, Significance, ISSN: 1740-9705
Hand DJ, 2018, Statistical challenges of administrative and transaction data, Journal of the Royal Statistical Society Series A: Statistics in Society, Vol: 181, Pages: 555-578, ISSN: 0964-1998
Administrative data are becoming increasingly important. They are typically the side effect of some operational exercise and are often seen as having significant advantages over alternative sources of data. Although it is true that such data have merits, statisticians should approach the analysis of such data with the same cautious and critical eye as they approach the analysis of data from any other source. The paper identifies some statistical challenges, with the aim of stimulating debate about and improving the analysis of administrative data, and encouraging methodology researchers to explore some of the important statistical problems which arise with such data.
Hand DJ, 2018, Evaluating Statistical and Machine Learning Supervised Classification Methods, Conference on Statistical Data Science, Publisher: WORLD SCIENTIFIC PUBL CO PTE LTD, Pages: 37-53
Hand DJ, 2017, Measurement: A Very Short Introduction - Rejoinder to discussion, Measurement: interdisciplinary research and perspectives, Vol: 15, Pages: 37-50, ISSN: 1536-6359
Hand DJ, Christen P, 2017, A note on using the F-measure for evaluating record linkage algorithms, Statistics and Computing, Vol: 28, Pages: 539-547, ISSN: 0960-3174
Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a classification problem where the aim is to decide whether a pair of records is a match (i.e. two records refer to the same real-world entity) or a non-match (two records refer to two different entities). Various classification techniques—including supervised, unsupervised, semi-supervised and active learning based—have been employed for record linkage. If ground truth data in the form of known true matches and non-matches are available, the quality of classified links can be evaluated. Due to the generally high class imbalance in record linkage problems, standard accuracy or misclassification rate are not meaningful for assessing the quality of a set of linked records. Instead, precision and recall, as commonly used in information retrieval and machine learning, are used. These are often combined into the popular F-measure, which is the harmonic mean of precision and recall. We show that the F-measure can also be expressed as a weighted sum of precision and recall, with weights which depend on the linkage method being used. This reformulation reveals that the F-measure has a major conceptual weakness: the relative importance assigned to precision and recall should be an aspect of the problem and the researcher or user, but not of the particular linkage method being used. We suggest alternative measures which do not suffer from this fundamental flaw.
Allin P, Hand DJ, 2017, From a system of national accounts to a process of national wellbeing accounting, International Statistical Review, Vol: 85, Pages: 355-370, ISSN: 1751-5823
There are repeated calls to go “Beyond GDP”, for measures of wellbeing and progress in addition to those that the System of National Accounts (SNA) is designed to provide. We identify key issues that can help build on the rigour of SNA whilst fitting the measurement of economic performance within a broader assessment of national wellbeing and progress. Such drivers are already leading to a proliferation of indicators and accounts, for example in the development of non-monetary measures of natural resources. There are significant measurement challenges, not least the question of whether a single, overall measure or index of wellbeing is valid. But the challenge of measurement, per se, is one thing: in our view, a more critical issue is whether the measures will actually be used. We propose a dynamic and multi-staged approach for developing SNA, embracing the production and use of measures. This would start by identifying user requirements for wider measures, to provide the basis for national and cross-national developments in wellbeing accounting. We envisage greater branding and marketing of national wellbeing concepts to promote measures and support their use. We call for outreach by producers, so that there is dialogue about the development and use of measures.
Hand DJ, Allin P, 2016, New statistics for old? -measuring the wellbeing of the UK, Journal of the Royal Statistical Society Series A - Statistics in Society, Vol: 180, Pages: 3-43, ISSN: 0964-1998
Attempts to create measures of national wellbeing and progress have a long history. Inthe UK, they go back at least as far as the 1790s, with Sir John Sinclair’s Statistical Accountof Scotland. More recently, worldwide interest has led to the creation of a number of indicesseeking to go beyond familiar economic measures like GDP. We review the MeasuringNational Well-being development programme of the UK’s Office for National Statistics, andexplore some of the challenges which need to be faced to bring wider measures into use.These include: the importance of getting the measures adopted as policy drivers; how tochallenge the continuing dominance of economic measures; sustainability and environmentalissues; international comparability; and methodological statistical questions.
Hand DJ, 2016, Measurement: a very short introduction, Publisher: Oxford University Press, ISBN: 9780198779568
Measurement underpins all of modern society, from science, through medicine, to management, economics, and government. This book describes the history of measurement, and presents a unified theory of measurement, covering all its aspects from measuring mass and length to measuring pain, depression, GDP, and beyond.
Hand DJ, 2016, The case against a paradigm shift in the way we use data, FST Journal, Vol: 21, Pages: 10-12, ISSN: 1475-1704
Hand DJ, 2016, Editorial: ‘Big data’ and data sharing, Journal of the Royal Statistical Society Series A-Statistics in Society, Vol: 179, Pages: 629-631, ISSN: 1467-985X
Hand DJ, Big data and data sharing, Journal of the Royal Statistical Society Series A - Statistics in Society, ISSN: 0964-1998
Hand DJ, 2015, From evidence to understanding: a commentary on Fisher (1922) 'On the mathematical foundations of theoretical statistics', PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, Vol: 373, ISSN: 1364-503X
The nature of statistics has changed over time. Itwas originally concerned with descriptive ‘mattersof state’—with summarizing population numbers,economic strength and social conditions. But duringthe course of the twentieth century its aim broadenedto include inference—how to use data to shed light onunderlying mechanisms, about what might happen inthe future, about what would happen if certain actionswere taken. Central to this development was RonaldFisher. Over the course of his life he was responsiblefor many of the major conceptual advances instatistics. This is particularly illustrated by his 1922paper, in which he introduced many of the conceptswhich remain fundamental to our understanding ofhow to extract meaning from data, right to the presentday. It is no exaggeration to say that Fisher’s work, asillustrated by the ideas he described and developedin this paper, underlies all modern science, andmuch more besides. This commentary was writtento celebrate the 350th anniversary of the journalPhilosophical Transactions of the Royal Society
Hand DJ, Anagnostopoulos C, 2014, A better Beta for the H measure of classification performance, PATTERN RECOGNITION LETTERS, Vol: 40, Pages: 41-46, ISSN: 0167-8655
Hand DJ, Adams NM, 2014, Selection bias in credit scorecard evaluation, JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, Vol: 65, Pages: 408-415, ISSN: 0160-5682
Hand DJ, 2014, Wonderful Examples, but Let's not Close Our Eyes, STATISTICAL SCIENCE, Vol: 29, Pages: 98-100, ISSN: 0883-4237
Hand DJ, 2013, Introduction to Probability with Texas Hold'em Examplese, INTERNATIONAL STATISTICAL REVIEW, Vol: 81, Pages: 334-334, ISSN: 0306-7734
Hand DJ, 2013, Graphical Models with R, INTERNATIONAL STATISTICAL REVIEW, Vol: 81, Pages: 316-316, ISSN: 0306-7734
Hand DJ, 2013, Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis, INTERNATIONAL STATISTICAL REVIEW, Vol: 81, Pages: 335-335, ISSN: 0306-7734
Hand DJ, 2013, A Statistical Guide for the Ethically Perplexed, INTERNATIONAL STATISTICAL REVIEW, Vol: 81, Pages: 314-316, ISSN: 0306-7734
Hand DJ, 2013, Comparing Groups: Randomization and Bootstrap Methods Using R, INTERNATIONAL STATISTICAL REVIEW, Vol: 81, Pages: 326-328, ISSN: 0306-7734
Hand DJ, 2013, Latent Variable Models and Factor Analysis: A Unified Approach, 3rd Edition, INTERNATIONAL STATISTICAL REVIEW, Vol: 81, Pages: 333-334, ISSN: 0306-7734
Hand DJ, 2013, A Practitioner's Guide to Resampling for Data Analysis, Data Mining, and Modeling, INTERNATIONAL STATISTICAL REVIEW, Vol: 81, Pages: 326-326, ISSN: 0306-7734
Hand DJ, 2013, Living Standards Analytics: Development through the Lens of Household Survey Data, INTERNATIONAL STATISTICAL REVIEW, Vol: 81, Pages: 331-332, ISSN: 0306-7734
Hand DJ, 2013, From Evidence to Understanding: A Precarious Path, European Review, Vol: 21, Pages: S32-S39, ISSN: 1234-981X
Falsifiability is the cornerstone of science. However, Rutherford notwithstanding, almost by definition science functions at the limits of measurement accuracy and theoretical grasp, so that statistical analysis is central to scientific advance. This applies as much to physics as it does to psychology, as much to geology as to biology. I look at some of the potholes in the path of scientific discovery, showing how easy it is to stumble, and at some of the consequences for the scientific endeavour.
Hand DJ, 2013, Data, Not Dogma: Big Data, Open Data, and the Opportunities Ahead, 12th International Symposium on Intelligent Data Analysis (IDA), Publisher: SPRINGER-VERLAG BERLIN, Pages: 1-12, ISSN: 0302-9743
Henrion M, Mortlock DJ, Hand DJ, et al., 2013, Classification and Anomaly Detection for Astronomical Survey Data, Springer Series in Astrostatistics, Pages: 149-184, ISBN: 9781461435075
© Springer Science+Business Media New York 2013. We present two statistical techniques for astronomical problems: a star-galaxy separator for the UKIRT Infrared Deep Sky Survey (UKIDSS) and a novel anomaly detection method for cross-matched astronomical datasets. The star-galaxy separator is a statistical classification method which outputs class membership probabilities rather than class labels and allows the use of prior knowledge about the source populations. Deep Sloan Digital Sky Survey (SDSS) data from the multiply imaged Stripe 82 region are used to check the results from our classifier, which compares favourably with the UKIDSS pipeline classification algorithm. The anomaly detection method addresses the problem posed by objects having different sets of recorded variables in cross-matched datasets. This prevents the use of methods unable to handle missing values and makes direct comparison between objects difficult. For each source, our method computes anomaly scores in subspaces of the observed feature space and combines them to an overall anomaly score. The proposed technique is very general and can easily be used in applications other than astronomy. The properties and performance of our method are investigated using both real and simulated datasets.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.