313 results found
Christen P, Hand DJ, Kirielle N, 2023, A review of the F-measure: its history, properties, criticism, and alternatives, ACM Computing Surveys, ISSN: 0360-0300
Methods to classify objects into two or more classes are at the core of various disciplines. When a set of objects with their true classes is available, a supervised classifier can be trained and employed to decide if, for example, a new patient has cancer or not. The choice of performance measure is critical in deciding which supervised method to use in any particular classification problem. Different measures can lead to very different choices, so the measure should match the objectives. Many performance measures have been developed, and one of them is the F-measure, the harmonic mean of precision and recall. Originally proposed in information retrieval, the F-measure has gained increasing interest in the context of classification. However, the rationale underlying this measure appears weak, and unlike other measures it does not have a representational meaning. The use of the harmonic mean also has little theoretical justification. The F-measure also stresses one class, which seems inappropriate for general classification problems. We provide a history of the F-measure and its use in computational disciplines, describe its properties, and discuss criticism about the F-Measure. We conclude with alternatives to the F-measure, and recommendations of how to use it effectively.
Hand DJ, Anagnostopoulos C, 2023, Notes on the H-measure of classifier performance, Advances in Data Analysis and Classification, Vol: 17, Pages: 109-124, ISSN: 1862-5347
The H-measure is a classifier performance measure which takes into account the context of application without requiring a rigid value of relative misclassification costs to be set. Since its introduction in 2009 it has become widely adopted. This paper answers various queries which users have raised since its introduction, including questions about its interpretation, the choice of a weighting function, whether it is strictly proper, its coherence, and relates the measure to other work.
Mayo DG, Hand D, 2022, Statistical significance and its critics: practicing damaging science, or damaging scientific practice?, Synthese: an international journal for epistemology, methodology and philosophy of science, Vol: 200, ISSN: 0039-7857
While the common procedure of statistical significance testing and its accompanying concept of p-values have long been surrounded by controversy, renewed concern has been triggered by the replication crisis in science. Many blame statistical significance tests themselves, and some regard them as sufficiently damaging to scientific practice as to warrant being abandoned. We take a contrary position, arguing that the central criticisms arise from misunderstanding and misusing the statistical tools, and that in fact the purported remedies themselves risk damaging science. We argue that banning the use of p-value thresholds in interpreting data does not diminish but rather exacerbates data-dredging and biasing selection effects. If an account cannot specify outcomes that will not be allowed to count as evidence for a claim—if all thresholds are abandoned—then there is no test of that claim. The contributions of this paper are: To explain the rival statistical philosophies underlying the ongoing controversy; To elucidate and reinterpret statistical significance tests, and explain how this reinterpretation ameliorates common misuses and misinterpretations; To argue why recent recommendations to replace, abandon, or retire statistical significance undermine a central function of statistics in science: to test whether observed patterns in the data are genuine or due to background variability
Hand D, 2022, Trustworthiness of statistical inference, Journal of the Royal Statistical Society Series A: Statistics in Society, Vol: 185, Pages: 329-347, ISSN: 0964-1998
We examine the role of trustworthiness and trust in statistical inference, arguing that it is theextent of trustworthiness in inferential statistical tools which enables trust in the conclusions.Certain tools, such as the p‐value and significance test, have recently come under renewedcriticism, with some arguing that they damage trust in statistics. We argue the contrary,beginning from the position that the central role of these methods is to form the basis fortrusted conclusions in the face of uncertainty in the data, and noting that it is the misuse andmisunderstanding of these tools which damages trustworthiness and hence trust. We go on toargue that recent calls to ban these tools would tackle the symptom, not the cause, andthemselves risk damaging the capability of science to advance, as well as risking feeding intopublic suspicion of the discipline of statistics. The consequence could be aggravated mistrust ofour discipline and of science more generally. In short, the very proposals could work in quitethe contrary direction from that intended. We make some alternative proposals for tackling themisuse and misunderstanding of these methods, and for how trust in our discipline might bepromoted.
Allin P, Hand DJ, 2021, Building back better needs better use of statistics., Signif (Oxf), Vol: 18, Pages: 44-45, ISSN: 1740-9705
Paul Allin and David J. Hand call for official statistics to take centre stage.
Hand D, Christen P, Kirielle N, 2021, F*: an interpretable transformation of the F-measure, Machine Learning, Vol: 110, Pages: 451-456, ISSN: 0885-6125
The F-measure, also known as the F1-score, is widely used to assess the performance of classification algorithms. However, some researchers find it lacking in intuitive interpretation, questioning the appropriateness of combining two aspects of performance as conceptually distinct as precision and recall, and also questioning whether the harmonic mean is the best way to combine them. To ease this concern, we describe a simple transformation of the F-measure, which we call F∗ (F-star), which has an immediate practical interpretation.
Hand D, Khan S, 2020, Validating and verifying AI systems, Patterns, Vol: 1, ISSN: 2666-3899
AI systems will only fulfil their promise for society if they can be relied upon. This means that the role and task of the system must be properly formulated, and that the system must be bug-free, based on properly representative data, can cope with anomalies and data quality issues, and that its output is sufficiently accurate for the task.
Hand DJ, 2020, Dark Data Why What You Don’t Know Matters, Publisher: Princeton University Press, ISBN: 9780691182377
In this book, David Hand looks at the ubiquitous phenomenon of "missing data.
Vichi M, Hand DJ, 2019, Trusted smart statistics: The challenge of extracting usable aggregate information from new data sources, Statistical Journal of the IAOS, Vol: 35, Pages: 605-613, ISSN: 1874-7655
Recent years have seen dramatic changes in sources of data, amounts of data, availability of data, frequency of data, and types of data. Along with advances in data analytic technology these changes have opened up huge possibilities for improving the information content and timeliness of official statistics. in this paper we characterise such “smart statistics”, examining their potential benefits and the obstacles that must be overcome if they are to be trusted and relied upon. In particular, we list eight specific recommendations which we believe producers of smart statistics should adhere to if the full potential for economic and social benefit is to be achieved.
Hand D, 2019, What is the purpose of statistical modelling?, Harvard Data Science Review, Vol: 1
Data science is a diverse field, and one which is changing rapidly in response to developments in theory and practical capabilities, as well as to the challenges of the great many domains in which it is applied. The “Diving into Data” column presents short articles describing ideas, concepts, methods, and tools used in data science. The overriding aim is that the articles should be enlightening, useful, and accessible, avoiding obscure technicalities.
Hand DJ, 2018, Aspects of data ethics in a changing world: Where are we now?, Big Data, Vol: 6, Pages: 176-190, ISSN: 2167-6461
Ready data availability, cheap storage capacity, and powerful tools for extracting information from data have the potential to significantly enhance the human condition. However, as with all advanced technologies, this comes with the potential for misuse. Ethical oversight and constraints are needed to ensure that an appropriate balance is reached. Ethical issues involving data may be more challenging than the ethical challenges of some other advanced technologies partly because data and data science are ubiquitous, having the potential to impact all aspects of life, and partly because of their intrinsic complexity. We explore the nature of data, personal data, data ownership, consent and purpose of use, trustworthiness of data as well as of algorithms and of those using the data, and matters of privacy and confidentiality. A checklist is given of topics that need to be considered.
Hand DJ, 2018, Who told you that? Data provenance, false facts, and how to tell the liars from the truth-tellers, Significance, ISSN: 1740-9705
Hand DJ, 2018, Statistical challenges of administrative and transaction data, Journal of the Royal Statistical Society Series A: Statistics in Society, Vol: 181, Pages: 555-578, ISSN: 0964-1998
Administrative data are becoming increasingly important. They are typically the side effect of some operational exercise and are often seen as having significant advantages over alternative sources of data. Although it is true that such data have merits, statisticians should approach the analysis of such data with the same cautious and critical eye as they approach the analysis of data from any other source. The paper identifies some statistical challenges, with the aim of stimulating debate about and improving the analysis of administrative data, and encouraging methodology researchers to explore some of the important statistical problems which arise with such data.
Hand DJ, 2018, Evaluating Statistical and Machine Learning Supervised Classification Methods, Conference on Statistical Data Science, Publisher: WORLD SCIENTIFIC PUBL CO PTE LTD, Pages: 37-53
Hand DJ, 2017, Measurement: A Very Short Introduction - Rejoinder to discussion, Measurement: interdisciplinary research and perspectives, Vol: 15, Pages: 37-50, ISSN: 1536-6359
Hand DJ, Christen P, 2017, A note on using the F-measure for evaluating record linkage algorithms, Statistics and Computing, Vol: 28, Pages: 539-547, ISSN: 0960-3174
Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a classification problem where the aim is to decide whether a pair of records is a match (i.e. two records refer to the same real-world entity) or a non-match (two records refer to two different entities). Various classification techniques—including supervised, unsupervised, semi-supervised and active learning based—have been employed for record linkage. If ground truth data in the form of known true matches and non-matches are available, the quality of classified links can be evaluated. Due to the generally high class imbalance in record linkage problems, standard accuracy or misclassification rate are not meaningful for assessing the quality of a set of linked records. Instead, precision and recall, as commonly used in information retrieval and machine learning, are used. These are often combined into the popular F-measure, which is the harmonic mean of precision and recall. We show that the F-measure can also be expressed as a weighted sum of precision and recall, with weights which depend on the linkage method being used. This reformulation reveals that the F-measure has a major conceptual weakness: the relative importance assigned to precision and recall should be an aspect of the problem and the researcher or user, but not of the particular linkage method being used. We suggest alternative measures which do not suffer from this fundamental flaw.
Allin P, Hand DJ, 2017, From a system of national accounts to a process of national wellbeing accounting, International Statistical Review, Vol: 85, Pages: 355-370, ISSN: 1751-5823
There are repeated calls to go “Beyond GDP”, for measures of wellbeing and progress in addition to those that the System of National Accounts (SNA) is designed to provide. We identify key issues that can help build on the rigour of SNA whilst fitting the measurement of economic performance within a broader assessment of national wellbeing and progress. Such drivers are already leading to a proliferation of indicators and accounts, for example in the development of non-monetary measures of natural resources. There are significant measurement challenges, not least the question of whether a single, overall measure or index of wellbeing is valid. But the challenge of measurement, per se, is one thing: in our view, a more critical issue is whether the measures will actually be used. We propose a dynamic and multi-staged approach for developing SNA, embracing the production and use of measures. This would start by identifying user requirements for wider measures, to provide the basis for national and cross-national developments in wellbeing accounting. We envisage greater branding and marketing of national wellbeing concepts to promote measures and support their use. We call for outreach by producers, so that there is dialogue about the development and use of measures.
Allin P, Hand DJ, 2017, New statistics for old? -measuring the wellbeing of the UK, Journal of the Royal Statistical Society Series A - Statistics in Society, Vol: 180, Pages: 3-43, ISSN: 0964-1998
Attempts to create measures of national wellbeing and progress have a long history. Inthe UK, they go back at least as far as the 1790s, with Sir John Sinclair’s Statistical Accountof Scotland. More recently, worldwide interest has led to the creation of a number of indicesseeking to go beyond familiar economic measures like GDP. We review the MeasuringNational Well-being development programme of the UK’s Office for National Statistics, andexplore some of the challenges which need to be faced to bring wider measures into use.These include: the importance of getting the measures adopted as policy drivers; how tochallenge the continuing dominance of economic measures; sustainability and environmentalissues; international comparability; and methodological statistical questions.
Hand DJ, 2016, Measurement: a very short introduction, Publisher: Oxford University Press, ISBN: 9780198779568
Measurement underpins all of modern society, from science, through medicine, to management, economics, and government. This book describes the history of measurement, and presents a unified theory of measurement, covering all its aspects from measuring mass and length to measuring pain, depression, GDP, and beyond.
Hand DJ, 2016, The case against a paradigm shift in the way we use data, FST Journal, Vol: 21, Pages: 10-12, ISSN: 1475-1704
Hand DJ, 2016, Editorial: ‘Big data’ and data sharing, Journal of the Royal Statistical Society Series A-Statistics in Society, Vol: 179, Pages: 629-631, ISSN: 1467-985X
Hand DJ, 2015, Big data and data sharing, Journal of the Royal Statistical Society Series A - Statistics in Society, ISSN: 0964-1998
Hand DJ, 2015, From evidence to understanding: a commentary on Fisher (1922) 'On the mathematical foundations of theoretical statistics', PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, Vol: 373, ISSN: 1364-503X
The nature of statistics has changed over time. Itwas originally concerned with descriptive ‘mattersof state’—with summarizing population numbers,economic strength and social conditions. But duringthe course of the twentieth century its aim broadenedto include inference—how to use data to shed light onunderlying mechanisms, about what might happen inthe future, about what would happen if certain actionswere taken. Central to this development was RonaldFisher. Over the course of his life he was responsiblefor many of the major conceptual advances instatistics. This is particularly illustrated by his 1922paper, in which he introduced many of the conceptswhich remain fundamental to our understanding ofhow to extract meaning from data, right to the presentday. It is no exaggeration to say that Fisher’s work, asillustrated by the ideas he described and developedin this paper, underlies all modern science, andmuch more besides. This commentary was writtento celebrate the 350th anniversary of the journalPhilosophical Transactions of the Royal Society
Hand DJ, Anagnostopoulos C, 2014, A better Beta for the H measure of classification performance, PATTERN RECOGNITION LETTERS, Vol: 40, Pages: 41-46, ISSN: 0167-8655
Hand DJ, Adams NM, 2014, Selection bias in credit scorecard evaluation, JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, Vol: 65, Pages: 408-415, ISSN: 0160-5682
Hand D, 2014, The Improbability Principle Why coincidences, miracles and rare events happen all the time, Publisher: Random House, ISBN: 9781448170661
Here, in this highly original book - aimed squarely at anyone with an interest in coincidences, probability or gambling - eminent statistician David Hand answers this question by weaving together various strands of probability into a ...
Hand DJ, 2014, Wonderful Examples, but Let's not Close Our Eyes, STATISTICAL SCIENCE, Vol: 29, Pages: 98-100, ISSN: 0883-4237
Allin P, Hand DJ, 2014, Wellbeing policy and measurement in the UK, WELLBEING OF NATIONS: MEANING, MOTIVE AND MEASUREMENT, Publisher: JOHN WILEY & SONS LTD, Pages: 217-235, ISBN: 978-1-118-48957-4
Allin P, Hand DJ, 2014, Appendix: Sources of methods and measures of wellbeing and progress, WELLBEING OF NATIONS: MEANING, MOTIVE AND MEASUREMENT, Publisher: JOHN WILEY & SONS LTD, Pages: 253-268, ISBN: 978-1-118-48957-4
Allin P, Hand DJ, 2014, Recent developments: Towards economic, social and environmental accounts, WELLBEING OF NATIONS: MEANING, MOTIVE AND MEASUREMENT, Publisher: JOHN WILEY & SONS LTD, Pages: 83-114, ISBN: 978-1-118-48957-4
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.