Imperial College London

Professor Anil Anthony Bharath

Faculty of EngineeringDepartment of Bioengineering

Academic Director (Singapore)
 
 
 
//

Contact

 

+44 (0)20 7594 5463a.bharath Website

 
 
//

Location

 

4.12Royal School of MinesSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@unpublished{Dai:2020,
author = {Dai, T and Arulkumaran, K and Gerbert, T and Tukra, S and Behbahani, F and Bharath, AA},
publisher = {arXiv},
title = {Analysing deep reinforcement learning agents trained with domain randomisation},
url = {http://arxiv.org/abs/1912.08324v2},
year = {2020}
}

RIS format (EndNote, RefMan)

TY  - UNPB
AB - Deep reinforcement learning has the potential to train robots to performcomplex tasks in the real world without requiring accurate models of the robotor its environment. A practical approach is to train agents in simulation, andthen transfer them to the real world. One popular method for achievingtransferability is to use domain randomisation, which involves randomlyperturbing various aspects of a simulated environment in order to make trainedagents robust to the reality gap. However, less work has gone intounderstanding such agents - which are deployed in the real world - beyond taskperformance. In this work we examine such agents, through qualitative andquantitative comparisons between agents trained with and without visual domainrandomisation. We train agents for Fetch and Jaco robots on a visuomotorcontrol task and evaluate how well they generalise using different testingconditions. Finally, we investigate the internals of the trained agents byusing a suite of interpretability techniques. Our results show that the primaryoutcome of domain randomisation is more robust, entangled representations,accompanied with larger weights with greater spatial structure; moreover, thetypes of changes are heavily influenced by the task setup and presence ofadditional proprioceptive inputs. Additionally, we demonstrate that our domainrandomised agents require higher sample complexity, can overfit and moreheavily rely on recurrent processing. Furthermore, even with an improvedsaliency method introduced in this work, we show that qualitative studies maynot always correspond with quantitative measures, necessitating the combinationof inspection tools in order to provide sufficient insights into the behaviourof trained agents.
AU - Dai,T
AU - Arulkumaran,K
AU - Gerbert,T
AU - Tukra,S
AU - Behbahani,F
AU - Bharath,AA
PB - arXiv
PY - 2020///
TI - Analysing deep reinforcement learning agents trained with domain randomisation
UR - http://arxiv.org/abs/1912.08324v2
UR - http://hdl.handle.net/10044/1/82424
ER -