Imperial College London


Faculty of EngineeringDyson School of Design Engineering




+44 (0)20 7594 9235p.kormushev Website




25 Exhibition Road, 3rd floor, Dyson BuildingDyson BuildingSouth Kensington Campus






BibTex format

author = {Pardo, F and Tavakoli, A and Levdik, V and Kormushev, P},
pages = {4042--4051},
title = {Time limits in reinforcement learning},
url = {},
year = {2018}

RIS format (EndNote, RefMan)

AB - In reinforcement learning, it is common to let anagent interact for a fixed amount of time with itsenvironment before resetting it and repeating theprocess in a series of episodes. The task that theagent has to learn can either be to maximize itsperformance over (i) that fixed period, or (ii) anindefinite period where time limits are only usedduring training to diversify experience. In thispaper, we provide a formal account for how timelimits could effectively be handled in each of thetwo cases and explain why not doing so can causestate-aliasing and invalidation of experience re-play, leading to suboptimal policies and traininginstability. In case (i), we argue that the termi-nations due to time limits are in fact part of theenvironment, and thus a notion of the remainingtime should be included as part of the agent’s in-put to avoid violation of the Markov property. Incase (ii), the time limits are not part of the envi-ronment and are only used to facilitate learning.We argue that this insight should be incorporatedby bootstrapping from the value of the state atthe end of each partial episode. For both cases,we illustrate empirically the significance of ourconsiderations in improving the performance andstability of existing reinforcement learning algo-rithms, showing state-of-the-art results on severalcontrol tasks.
AU - Pardo,F
AU - Tavakoli,A
AU - Levdik,V
AU - Kormushev,P
EP - 4051
PY - 2018///
SP - 4042
TI - Time limits in reinforcement learning
UR -
UR -
ER -