Imperial College London

Professor Anil Anthony Bharath

Faculty of EngineeringDepartment of Bioengineering

Academic Director (Singapore)
 
 
 
//

Contact

 

+44 (0)20 7594 5463a.bharath Website

 
 
//

Location

 

4.12Royal School of MinesSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@unpublished{Sarrico:2019,
author = {Sarrico, M and Arulkumaran, K and Agostinelli, A and Richemond, P and Bharath, AA},
publisher = {arXiv},
title = {Sample-efficient reinforcement learning with maximum entropy mellowmax episodic control},
url = {http://arxiv.org/abs/1911.09615v1},
year = {2019}
}

RIS format (EndNote, RefMan)

TY  - UNPB
AB - Deep networks have enabled reinforcement learning to scale to more complexand challenging domains, but these methods typically require large quantitiesof training data. An alternative is to use sample-efficient episodic controlmethods: neuro-inspired algorithms which use non-/semi-parametric models thatpredict values based on storing and retrieving previously experiencedtransitions. One way to further improve the sample efficiency of theseapproaches is to use more principled exploration strategies. In this work, wetherefore propose maximum entropy mellowmax episodic control (MEMEC), whichsamples actions according to a Boltzmann policy with a state-dependenttemperature. We demonstrate that MEMEC outperforms other uncertainty- andsoftmax-based exploration methods on classic reinforcement learningenvironments and Atari games, achieving both more rapid learning and higherfinal rewards.
AU - Sarrico,M
AU - Arulkumaran,K
AU - Agostinelli,A
AU - Richemond,P
AU - Bharath,AA
PB - arXiv
PY - 2019///
TI - Sample-efficient reinforcement learning with maximum entropy mellowmax episodic control
UR - http://arxiv.org/abs/1911.09615v1
UR - http://hdl.handle.net/10044/1/75283
ER -