Imperial College London

ProfessorWayneLuk

Faculty of EngineeringDepartment of Computing

Professor of Computer Engineering
 
 
 
//

Contact

 

+44 (0)20 7594 8313w.luk Website

 
 
//

Location

 

434Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@inproceedings{Shao:2017:10.23919/FPL.2017.8056789,
author = {Shao, S and Luk, W},
doi = {10.23919/FPL.2017.8056789},
publisher = {IEEE},
title = {Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation},
url = {http://dx.doi.org/10.23919/FPL.2017.8056789},
year = {2017}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment to find an optimal policy that maximises the reward. Trust Region Policy Optimisation (TRPO) is a recent policy optimisation algorithm that achieves superior results in various RL benchmarks, but is computationally expensive. This paper proposes Customised Pearlmutter Propagation (CPP), a novel hardware architecture that accelerates TRPO on FPGA. We use the Pearlmutter Algorithm to address the key computational bottleneck of TRPO in a hardware efficient manner, avoiding symbolic differentiation with change of variables. Experimental evaluation using robotic locomotion benchmarks demonstrates that the proposed CPP architecture implemented on Stratix-V FPGA can achieve up to 20 times speed-up against 6-threaded Keras deep learning library with Theano backend running on a Core i7-5930K CPU.
AU - Shao,S
AU - Luk,W
DO - 10.23919/FPL.2017.8056789
PB - IEEE
PY - 2017///
SN - 1946-1488
TI - Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation
UR - http://dx.doi.org/10.23919/FPL.2017.8056789
UR - http://hdl.handle.net/10044/1/56419
ER -