Publications

BibTex format

@inproceedings{Shao:2017:10.23919/FPL.2017.8056789,
author = {Shao, S and Luk, W},
doi = {10.23919/FPL.2017.8056789},
publisher = {IEEE},
title = {Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation},
url = {http://dx.doi.org/10.23919/FPL.2017.8056789},
year = {2017}
}

Download

RIS format (EndNote, RefMan)

TY  - CPAPER
AB  - Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment to find an optimal policy that maximises the reward. Trust Region Policy Optimisation (TRPO) is a recent policy optimisation algorithm that achieves superior results in various RL benchmarks, but is computationally expensive. This paper proposes Customised Pearlmutter Propagation (CPP), a novel hardware architecture that accelerates TRPO on FPGA. We use the Pearlmutter Algorithm to address the key computational bottleneck of TRPO in a hardware efficient manner, avoiding symbolic differentiation with change of variables. Experimental evaluation using robotic locomotion benchmarks demonstrates that the proposed CPP architecture implemented on Stratix-V FPGA can achieve up to 20 times speed-up against 6-threaded Keras deep learning library with Theano backend running on a Core i7-5930K CPU.
AU  - Shao,S
AU  - Luk,W
DO  - 10.23919/FPL.2017.8056789
PB  - IEEE
PY  - 2017///
SN  - 1946-1488
TI  - Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation
UR  - http://dx.doi.org/10.23919/FPL.2017.8056789
UR  - http://hdl.handle.net/10044/1/56419
ER  -

Download

ProfessorWayneLuk

Contact

Location

Summary

Citation

BibTex format

RIS format (EndNote, RefMan)