Imperial College London

ProfessorFrancescaToni

Faculty of EngineeringDepartment of Computing

Professor in Computational Logic
 
 
 
//

Contact

 

+44 (0)20 7594 8228f.toni Website

 
 
//

Location

 

430Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@inproceedings{Toni:2015,
author = {Toni, F and Gao, Y},
publisher = {AAAI Press/ International Joint Conference on Artificial Intelligence},
title = {Potential based reward shaping for hierarchical reinforcement learning},
url = {https://www.ijcai.org/Abstract/15/493},
year = {2015}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - Hierarchical Reinforcement Learning (HRL) outperformsmany ‘flat’ Reinforcement Learning (RL)algorithms in some application domains. However,HRL may need longer time to obtain the optimalpolicy because of its large action space. PotentialBased Reward Shaping (PBRS) has been widelyused to incorporate heuristics into flat RL algorithmsso as to reduce their exploration. In thispaper, we investigate the integration of PBRS andHRL, and propose a new algorithm: PBRS-MAXQ-0. We prove that under certain conditions, PBRSMAXQ-0is guaranteed to converge. Empirical resultsshow that PBRS-MAXQ-0 significantly outperformsMAXQ-0 given good heuristics, and canconverge even when given misleading heuristics.
AU - Toni,F
AU - Gao,Y
PB - AAAI Press/ International Joint Conference on Artificial Intelligence
PY - 2015///
TI - Potential based reward shaping for hierarchical reinforcement learning
UR - https://www.ijcai.org/Abstract/15/493
UR - http://hdl.handle.net/10044/1/23308
ER -