Imperial College London

ProfessorNiallAdams

Faculty of Natural SciencesDepartment of Mathematics

Professor of Statistics
 
 
 
//

Contact

 

+44 (0)20 7594 8837n.adams Website

 
 
//

Location

 

6M55Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Li:2019,
author = {Li, Y and Bellotti, A and Adams, N},
journal = {Foundations of Data Science},
title = {Issues using logistic regression with class imbalance, with a case study from credit risk modelling},
year = {2019}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - The class imbalance problem arises in two-class classification problems, when the less frequent (minority) class is observed much less than themajority class. This characteristic is endemic in many problems such as modeling default or fraud detection. Recent work by Owen [19] has shown that, ina theoretical context related to infinite imbalance, logistic regression behavesin such a way that all data in the rare class can be replaced by their meanvector to achieve the same coefficient estimates. We build on Owen’s results toshow the phenomenon remains true for both weighted and penalized likelihoodmethods. Such results suggest that problems may occur if there is structurewithin the rare class that is not captured by the mean vector. We demonstratethis problem and suggest a relabelling solution based on clustering the minority class. In a simulation and a real mortgage dataset, we show that logisticregression is not able to provide the best out-of-sample predictive performanceand that an approach that is able to model underlying structure in the minorityclass is often superior.
AU - Li,Y
AU - Bellotti,A
AU - Adams,N
PY - 2019///
TI - Issues using logistic regression with class imbalance, with a case study from credit risk modelling
T2 - Foundations of Data Science
ER -