Title

Understanding Uncertainty Sampling

Abstract

Active learning is a machine learning paradigm where the learning algorithm interactively queries humans (or some other information source) to annotate new data points. Uncertainty sampling, a prevalent active learning algorithm in practice, queries sequentially the annotations of those data points for which the current prediction is uncertain. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of “uncertainty” for a specific task (classification or regression) under a specific loss (binary loss, cross-entropy loss, mean squared error, etc.); (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle these sequentially arrived annotated data under the framework of empirical risk minimization (ERM) or optimization algorithms such as stochastic gradient descent (SGD). In this work, we introduce the notion of conditional risk as the expected loss of a prediction model (or model parameters) conditional on the input variables and establish that when the uncertainty sampling is performed according to a probability distribution proportional to the conditional risk, the active learning procedure with a careful design of the SGD algorithm essentially optimizes a surrogate loss of the original ERM loss function in a coordinate descent manner. From this perspective, we first establish the entropy uncertainty as an estimate of the conditional risk under the cross-entropy loss, and the margin uncertainty as an estimate of the conditional risk under the binary loss. Then we unify different versions of the existing uncertainty sampling algorithms: the argmax version minimizes a conditional value-at-risk loss, the sampling version minimizes a surrogate loss, and the (1-epsilon)-greedy version minimizes a distributionally robust loss. This new perspective enables us to develop the first sample complexity bound for uncertainty sampling. Our theoretical development sheds lights on the choice of the uncertainty measure, the design of optimization algorithms under the active learning dynamics, and the connection between uncertainty sampling with model robustness and risk calibration.

(Joint work with Shang Liu from Imperial College Business School)

Bio

Xiaocheng Li is an assistant professor at the Analytics, Operations and Marketing group of Imperial College Business School, Imperial College London. Prior to that, he received his Ph.D. from Department of Management Science and Engineering at Stanford University in 2020 and B.S. degree from School of Mathematical Sciences at Peking University in 2014. His research interests cover a wide range of topics in operations research and management science, including optimization, revenue management, dynamic programming, and machine learning applications. Recently, he also works on topics related to the theoretical understanding and uncertainty quantifications of deep learning models and algorithms.