Imperial College London

Professor Aldo Faisal

Faculty of EngineeringDepartment of Bioengineering

Professor of AI & Neuroscience
 
 
 
//

Contact

 

+44 (0)20 7594 6373a.faisal Website

 
 
//

Assistant

 

Miss Teresa Ng +44 (0)20 7594 8300

 
//

Location

 

4.08Royal School of MinesSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@inproceedings{Li:2020,
author = {Li, L and Faisal, A},
publisher = {AAAI},
title = {Bayesian distributional policy gradients},
url = {http://hdl.handle.net/10044/1/86088},
year = {2020}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - Distributional reinforcement learning (Distributional RL)maintains the entire probability distribution of the reward-to-go, i.e. the return, providing a more principled approach to account for the uncertainty associated with policy performance, which may be beneficial for trading off exploration and exploitation and policy learning in general. Previous work in distributional RL focused mainly on computing the state-action-return distributions, here we model the state-return distributions. This enables us to translate successful conventional RL algorithms that are based on state values into distributional RL. We formulate the distributional Bell-man operation as an inference-based auto-encoding process that minimises Wasserstein metrics between target/model re-turn distributions. Our algorithm, BDPG (Bayesian Distributional Policy Gradients), uses adversarial training in joint-contrastive learning to learn a variational posterior from there turns. Moreover, we can now interpret the return prediction uncertainty as an information gain, which allows to obtain anew curiosity measure that helps BDPG steer exploration actively and efficiently. In our experiments, Atari 2600 games and MuJoCo tasks, we demonstrate how BDPG learns generally faster and with higher asymptotic performance than reference distributional RL algorithms, including well known hard exploration tasks.
AU - Li,L
AU - Faisal,A
PB - AAAI
PY - 2020///
TI - Bayesian distributional policy gradients
UR - http://hdl.handle.net/10044/1/86088
ER -