Imperial College London


Faculty of EngineeringDyson School of Design Engineering




+44 (0)20 7594 9235p.kormushev Website




25 Exhibition Road, 3rd floor, Dyson BuildingDyson BuildingSouth Kensington Campus






BibTex format

author = {Tavakoli, A and Levdik, V and Islam, R and Kormushev, P},
title = {Prioritizing starting states for reinforcement learning},
url = {},
year = {2019}

RIS format (EndNote, RefMan)

AB - Online, off-policy reinforcement learning algorithms are able to use anexperience memory to remember and replay past experiences. In prior work, thisapproach was used to stabilize training by breaking the temporal correlationsof the updates and avoiding the rapid forgetting of possibly rare experiences.In this work, we propose a conceptually simple framework that uses anexperience memory to help exploration by prioritizing the starting states fromwhich the agent starts acting in the environment, importantly, in a fashionthat is also compatible with on-policy algorithms. Given the capacity torestart the agent in states corresponding to its past observations, we achievethis objective by (i) enabling the agent to restart in states belonging tosignificant past experiences (e.g., nearby goals), and (ii) promoting fastercoverage of the state space through starting from a more diverse set of states.While, using a good priority measure to identify significant past transitions,we expect case (i) to more considerably help exploration in certain domains(e.g., sparse reward tasks), we hypothesize that case (ii) will generally bebeneficial, even without any prioritization. We show empirically that ourapproach improves learning performance for both off-policy and on-policy deepreinforcement learning methods, with most notable gains in highly sparse rewardtasks.
AU - Tavakoli,A
AU - Levdik,V
AU - Islam,R
AU - Kormushev,P
PY - 2019///
TI - Prioritizing starting states for reinforcement learning
UR -
UR -
ER -