Publications

BibTex format

@inproceedings{Li:2020,
author = {Li, L and Faisal, A},
publisher = {AAAI},
title = {Bayesian distributional policy gradients},
url = {http://hdl.handle.net/10044/1/86088},
year = {2020}
}

Download

RIS format (EndNote, RefMan)

TY  - CPAPER
AB  - Distributional  reinforcement  learning  (Distributional  RL)maintains  the  entire  probability  distribution  of  the  reward-to-go, i.e. the return, providing a more principled approach to  account  for  the  uncertainty  associated  with  policy  performance,  which  may  be  beneficial  for  trading  off  exploration and exploitation and  policy learning in general.  Previous work in distributional RL focused mainly on computing  the  state-action-return  distributions,  here  we  model  the state-return distributions. This enables us to translate successful conventional RL algorithms that are based on state values into distributional RL. We formulate the distributional Bell-man operation as an inference-based auto-encoding process that minimises Wasserstein metrics between target/model re-turn distributions. Our algorithm, BDPG (Bayesian Distributional  Policy  Gradients),  uses  adversarial  training  in  joint-contrastive learning to learn a variational posterior from there turns. Moreover, we can now interpret the return prediction uncertainty as an information gain, which allows to obtain anew curiosity measure that helps BDPG steer exploration actively and efficiently. In our experiments, Atari 2600 games and MuJoCo tasks, we demonstrate how BDPG learns generally faster and with higher asymptotic performance than reference distributional RL algorithms, including well known hard exploration tasks.
AU  - Li,L
AU  - Faisal,A
PB  - AAAI
PY  - 2020///
TI  - Bayesian distributional policy gradients
UR  - http://hdl.handle.net/10044/1/86088
ER  -

Download

Professor Aldo Faisal

Contact

Assistant

Location

Summary

Citation

BibTex format

RIS format (EndNote, RefMan)