Imperial College London

ProfessorDavidHand

Faculty of Natural SciencesDepartment of Mathematics

Senior Research Investigator
 
 
 
//

Contact

 

+44 (0)20 7594 2843d.j.hand CV

 
 
//

Assistant

 

Mrs Agnieszka Damasiewicz Niccolai +44 (0)20 7594 2843

 
//

Location

 

547Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Hand:2017:10.1007/s11222-017-9746-6,
author = {Hand, DJ and Christen, P},
doi = {10.1007/s11222-017-9746-6},
journal = {Statistics and Computing},
pages = {539--547},
title = {A note on using the F-measure for evaluating record linkage algorithms},
url = {http://dx.doi.org/10.1007/s11222-017-9746-6},
volume = {28},
year = {2017}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - Record linkage is the process of identifying and linking records about the same entities from one or more databases. Record linkage can be viewed as a classification problem where the aim is to decide whether a pair of records is a match (i.e. two records refer to the same real-world entity) or a non-match (two records refer to two different entities). Various classification techniques—including supervised, unsupervised, semi-supervised and active learning based—have been employed for record linkage. If ground truth data in the form of known true matches and non-matches are available, the quality of classified links can be evaluated. Due to the generally high class imbalance in record linkage problems, standard accuracy or misclassification rate are not meaningful for assessing the quality of a set of linked records. Instead, precision and recall, as commonly used in information retrieval and machine learning, are used. These are often combined into the popular F-measure, which is the harmonic mean of precision and recall. We show that the F-measure can also be expressed as a weighted sum of precision and recall, with weights which depend on the linkage method being used. This reformulation reveals that the F-measure has a major conceptual weakness: the relative importance assigned to precision and recall should be an aspect of the problem and the researcher or user, but not of the particular linkage method being used. We suggest alternative measures which do not suffer from this fundamental flaw.
AU - Hand,DJ
AU - Christen,P
DO - 10.1007/s11222-017-9746-6
EP - 547
PY - 2017///
SN - 0960-3174
SP - 539
TI - A note on using the F-measure for evaluating record linkage algorithms
T2 - Statistics and Computing
UR - http://dx.doi.org/10.1007/s11222-017-9746-6
UR - http://hdl.handle.net/10044/1/46235
VL - 28
ER -