Imperial College London


Faculty of EngineeringDyson School of Design Engineering




+44 (0)20 7594 9235p.kormushev Website




25 Exhibition Road, 3rd floor, Dyson BuildingDyson BuildingSouth Kensington Campus






BibTex format

author = {Pardo, F and Levdik, V and Kormushev, P},
title = {Q-map: A convolutional approach for goal-oriented reinforcement learning.},
url = {},
year = {2018}

RIS format (EndNote, RefMan)

AB - Goal-oriented learning has become a core concept in reinforcement learning(RL), extending the reward signal as a sole way to define tasks. However, asparameterizing value functions with goals increases the learning complexity,efficiently reusing past experience to update estimates towards several goalsat once becomes desirable but usually requires independent updates per goal.Considering that a significant number of RL environments can support spatialcoordinates as goals, such as on-screen location of the character in ATARI orSNES games, we propose a novel goal-oriented agent called Q-map that utilizesan autoencoder-like neural network to predict the minimum number of stepstowards each coordinate in a single forward pass. This architecture is similarto Horde with parameter sharing and allows the agent to discover correlationsbetween visual patterns and navigation. For example learning how to use aladder in a game could be transferred to other ladders later. We show how thisnetwork can be efficiently trained with a 3D variant of Q-learning to updatethe estimates towards all goals at once. While the Q-map agent could be usedfor a wide range of applications, we propose a novel exploration mechanism inplace of epsilon-greedy that relies on goal selection at a desired distancefollowed by several steps taken towards it, allowing long and coherentexploratory steps in the environment. We demonstrate the accuracy andgeneralization qualities of the Q-map agent on a grid-world environment andthen demonstrate the efficiency of the proposed exploration mechanism on thenotoriously difficult Montezuma's Revenge and Super Mario All-Stars games.
AU - Pardo,F
AU - Levdik,V
AU - Kormushev,P
PY - 2018///
TI - Q-map: A convolutional approach for goal-oriented reinforcement learning.
UR -
UR -
ER -