Imperial College London


Faculty of EngineeringDyson School of Design Engineering




+44 (0)20 7594 9235p.kormushev Website




10-12 Prince's GardensSouth Kensington Campus






BibTex format

author = {Kormushev, P},
title = {Time Hopping Technique for Reinforcement Learning and its Application to Robot Control},
url = {},
year = {2009}

RIS format (EndNote, RefMan)

AB - To speed up the convergence of reinforcement learning(RL) algorithms by more efficient use of computer simulations,three algorithmic techniques are proposed: Time Manipulation,Time Hopping, and Eligibility Propagation. They are evaluatedon various robot control tasks.The proposed Time Manipulation [1] is a concept ofmanipulating the time inside a simulation and using it asa tool to speed up the learning process. It is applicable toa subset of RL problems whose goal is to learn a controlpolicy to avoid failure events. Time Manipulation works byturning back the time of the simulation on failure events, thusavoiding redundant state transitions and exploring deeper thestate space. This is impossible to be done in the real world,but it can easily be done in a simulation. In order to evaluatethe proposed algorithm, experiments on a classical controlbenchmark problem are conducted: an inverted pendulumbalancing robot task. The aim of the RL algorithm is to finda control policy which can prevent the pendulum from fallingby moving the robot left or right, without hitting the edgesof the given track. The experimental results show that TimeManipulation speeds up the learning process by 260%. It alsoimproves the state space exploration by 12%, because it allowsthe RL algorithm to explore better the state space in proximityof failure states.The proposed Time Hopping [2] is a generalization ofTime Manipulation, able to make arbitrary ”hops” betweenstates and this way traverse rapidly throughout the entirestate space. Time Hopping extends the applicability of timemanipulations to include not only failure-avoidance problems,but also continuous optimization problems, by creating newmechanisms to trigger the time manipulation events, to makeprediction about the possible future rewards, and to selectpromising time hopping targets. The proposed implementationof the Time Hopping technique consists of 3 components:Hopping trigger (decides when the hopping starts), Targetselec
AU - Kormushev,P
PY - 2009///
TI - Time Hopping Technique for Reinforcement Learning and its Application to Robot Control
UR -
UR -
ER -