Imperial College London

ProfessorNiallAdams

Faculty of Natural SciencesDepartment of Mathematics

Professor of Statistics
 
 
 
//

Contact

 

+44 (0)20 7594 8837n.adams Website

 
 
//

Location

 

6M55Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Lu:2019:10.3934/fods.2019009,
author = {Lu, X and Adams, N and Kantas, N},
doi = {10.3934/fods.2019009},
journal = {Foundations of Data Science},
pages = {197--225},
title = {On adaptive estimation for dynamic Bernoulli bandits},
url = {http://dx.doi.org/10.3934/fods.2019009},
volume = {1},
year = {2019}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - The multi-armed bandit (MAB) problem is a classic example of theexploration-exploitation dilemma. It is concerned with maximising the totalrewards for a gambler by sequentially pulling an arm from a multi-armed slotmachine where each arm is associated with a reward distribution. In staticMABs, the reward distributions do not change over time, while in dynamic MABs,each arm's reward distribution can change, and the optimal arm can switch overtime. Motivated by many real applications where rewards are binary, we focus ondynamic Bernoulli bandits. Standard methods like $\epsilon$-Greedy and UpperConfidence Bound (UCB), which rely on the sample mean estimator, often fail totrack changes in the underlying reward for dynamic problems. In this paper, weovercome the shortcoming of slow response to change by deploying adaptiveestimation in the standard methods and propose a new family of algorithms,which are adaptive versions of $\epsilon$-Greedy, UCB, and Thompson sampling.These new methods are simple and easy to implement. Moreover, they do notrequire any prior knowledge about the dynamic reward process, which isimportant for real applications. We examine the new algorithms numerically indifferent scenarios and the results show solid improvements of our algorithmsin dynamic environments.
AU - Lu,X
AU - Adams,N
AU - Kantas,N
DO - 10.3934/fods.2019009
EP - 225
PY - 2019///
SN - 2639-8001
SP - 197
TI - On adaptive estimation for dynamic Bernoulli bandits
T2 - Foundations of Data Science
UR - http://dx.doi.org/10.3934/fods.2019009
UR - http://arxiv.org/abs/1712.03134v2
UR - http://hdl.handle.net/10044/1/70336
VL - 1
ER -