Publications

BibTex format

@unpublished{Sarrico:2019,
author = {Sarrico, M and Arulkumaran, K and Agostinelli, A and Richemond, P and Bharath, AA},
publisher = {arXiv},
title = {Sample-efficient reinforcement learning with maximum entropy mellowmax episodic control},
url = {http://arxiv.org/abs/1911.09615v1},
year = {2019}
}

Download

RIS format (EndNote, RefMan)

TY  - UNPB
AB  - Deep networks have enabled reinforcement learning to scale to more complexand challenging domains, but these methods typically require large quantitiesof training data. An alternative is to use sample-efficient episodic controlmethods: neuro-inspired algorithms which use non-/semi-parametric models thatpredict values based on storing and retrieving previously experiencedtransitions. One way to further improve the sample efficiency of theseapproaches is to use more principled exploration strategies. In this work, wetherefore propose maximum entropy mellowmax episodic control (MEMEC), whichsamples actions according to a Boltzmann policy with a state-dependenttemperature. We demonstrate that MEMEC outperforms other uncertainty- andsoftmax-based exploration methods on classic reinforcement learningenvironments and Atari games, achieving both more rapid learning and higherfinal rewards.
AU  - Sarrico,M
AU  - Arulkumaran,K
AU  - Agostinelli,A
AU  - Richemond,P
AU  - Bharath,AA
PB  - arXiv
PY  - 2019///
TI  - Sample-efficient reinforcement learning with maximum entropy mellowmax episodic control
UR  - http://arxiv.org/abs/1911.09615v1
UR  - http://hdl.handle.net/10044/1/75283
ER  -

Download

Professor Anil Anthony Bharath

Contact

Location

Summary

Citation

BibTex format

RIS format (EndNote, RefMan)