Videos
Task-Embedded Control Networks
Much like humans, robots should have the ability to leverage knowledge from previously learned tasks in order to learn new tasks quickly in new and unfamiliar environments. Despite this, most robot learning approaches have focused on learning a single task, from scratch, with a limited notion of generalisa- tion, and no way of leveraging the knowledge to learn other tasks more efficiently. One possible solution is meta-learning, but many of the related approaches are limited in their ability to scale to a large number of tasks and to learn further tasks without forgetting previously learned ones. With this in mind, we introduce Task-Embedded Control Networks, which employ ideas from metric learning in order to create a task embedding that can be used by a robot to learn new tasks from one or more demonstrations.
Stephen James, Michael Bloesch, Andrew J. Davison. Task-Embedded Control Networks from Few-Shot Imitation Learning. CoRL, 2018
LS-Net
Sum-of-squares objective functions are very popular in computer vision algorithms. However, these objective functions are not always easy to optimize. The underlying assumptions made by solvers are often not satisfied and many problems are inherently ill-posed. In this paper, we propose LS-Net, a neural nonlinear least squares optimization algorithm which learns to effectively optimize these cost functions even in the presence of adversities. Unlike traditional approaches, the proposed solver requires no hand-crafted regularizers or priors as these are implicitly learned from the data. We apply our method to the problem of motion stereo ie. jointly estimating the motion and scene geometry from pairs of images of a monocular sequence. We show that our learned optimizer is able to efficiently and effectively solve this challenging optimization problem.
Ronald Clark, Michael Bloesch, Jan Czarnowski, Stefan Leutenegger, Andrew J. Davison. LS-Net: Learning to Solve Nonlinear Least Squares for Monocular Stereo. ECCV, 2018
Fusion++
We propose an online object-level SLAM system which builds a persistent and accurate 3D graph map of arbitrary reconstructed objects. As an RGB-D camera browses a cluttered indoor scene, Mask-RCNN instance segmentations are used to initialise compact per-object Truncated Signed Distance Function (TSDF) reconstructions with object size-dependent resolutions and a novel 3D foreground mask. Reconstructed objects are stored in an optimisable 6DoF pose graph which is our only persistent map representation. Objects are incrementally refined via depth fusion, and are used for tracking, relocalisation and loop closure detection.
John McCormac, Ronald Clark, Michael Bloesch, Stefan Leutenegger, Andrew J. Davison. Fusion++: Volumetric Object-Level SLAM. 3DV, 2018
CodeSLAM
We present a new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters. Our approach is suitable for use in a keyframe-based monocular dense SLAM system: while each keyframe with a code can produce a depth map, the code can be optimised efficiently jointly with pose variables and together with the codes of overlapping keyframes to attain global consistency. Conditioning the depth map on the image means that the code only needs to represent aspects of the local geometry which cannot directly be predicted from the image.
Michael Bloesch, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, Andrew J. Davison. CodeSLAM — Learning a Compact, Optimisable Representation for Dense Visual SLAM. CVPR, 2018
End-to-End Visuomotor Control
We show how two simple techniques can lead to end-to-end (image to velocity) execution of a multi-stage task that is analogous to a simple tidying routine, without having seen a single real image. Our results show that we are able to successfully accomplish the task in the real world with the ability to generalise to novel environments, including those with novel lighting conditions and distractor objects, and the ability to deal with moving objects, including the basket itself.
Stephen James, Andrew J Davison and Edward Johns. End-to-End Visuomotor Control for a Multi-Stage Task. CoRL, 2017
Dense RGB-D-Inertial SLAM with Map Deformations
The first tightly-coupled dense RGB-D-inertial SLAM system that has real-time capability while running on a GPU. It jointly optimises for the camera pose, velocity, IMU biases and gravity direction while building up a globally consistent, fully dense surfel-based 3D reconstruction of the environment. Our dense visual-inertial SLAM system is more robust to fast motions and periods of low texture and low geometric variation than a related RGB-D-only SLAM system.
Tristan Laidlow, Michael Bloesch, Wenbin Li and Stefan Leutenegger. Dense RGB-D-Inertial SLAM with Map Deformations. IROS, 2017
Semantic Texture for Robust Dense Tracking
Robust dense SLAM systems can make valuable use of the layers of features coming from a standard CNN as a pyramid of ‘semantic texture’ which is suitable for dense alignment while being much more robust to nuisance factors such as lighting than raw RGB values. We use a straightforward Lucas-Kanade formulation of image alignment, with a schedule of iterations over the coarse-to-fine levels of a pyramid, and simply replace the usual image pyramid by the hierarchy of convolutional feature maps from a pre-trained CNN. The resulting dense alignment performance is much more robust to lighting and other variations. A selection of a small number of the total set of features output by a CNN can give just as accurate but much more efficient tracking performance.
Jan Czarnowski, Stefan Leutenegger, Andrew J Davison. Semantic Texture for Robust Dense Tracking. Geometry Meets Deep Learning Workshop, ICCV, 2017
SemanticFusion: Dense 3D Semantic Mapping with CNN
We address the challenge of semantic maps by combining Convolutional Neural Networks (CNNs) and a state of the art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondence between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN’s semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of ≈25Hz.
John McCormac, Ankur Handa, Andrew J Davison, Stefan Leutenegger. SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. ICRA, 2017
Simultaneous Optical Flow and Intensity Estimation
This video demonstrates our algorithm to simultaneously recover the motion field and brightness image, while the event camera undergoes a generic motion through any scene. Our approach employs minimisation of a cost function that contains the asynchronous event data as well as spatial and temporal regularisation within a sliding window time interval. Our implementation relies on GPU optimisation and runs in near real-time.
P Bardow, AJ Davison, S Leutenegger. Simultaneous Optical Flow and Intensity Estimation from an Event Camera. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 884-892
Real-Time Height Map Fusion

Monocular, Real-Time Surface Reconstruction
Real-Time depth map fusion that is capable of obtaining high detail, close-up reconstruction
This video presents our approach to a scalable, real-time capable method for robust surface reconstruction that explicitly handles multiple scales. We perform depth-map and colour fusion directly into a multi-resolution triangular mesh that can be adaptively tessellated using the concept of Dynamic Level of Detail. Our method is capable of obtaining high quality, close-up reconstruction, as well as capturing overall scene geometry, while being memory and computationally efficient.
Jacek Zienkiewicz, Akis Tsiotsios, Andrew Davison, Stefan Leutenegger. Monocular, Real-Time Surface Reconstruction using Dynamic Level of Detail. International Conference on 3D Vision (3DV), 2016

Real-Time Height Map Fusion using Differentiable Rendering
Real-time method performing dense reconstruction of high quality height maps from monocular video.
This video presents a robust real-time method which performs dense reconstruction of high quality height maps from monocular video. By representing the height map as a triangular mesh, and using efficient differentiable rendering approach, our method enables rigorous incremental probabilistic fusion of standard locally estimated depth and colour into an immediately usable dense model.
Jacek Zienkiewicz, Andrew J Davison, Stefan Leutenegger. Real-Time Height Map Fusion using Differentiable Rendering. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016
Deep Learning for Robot Grasping via Simulation
This video demonstrates our approach to grasping under gripper pose uncertainty. In this approach, we assign a score for every possible grasp pose allowing us to achieve robustness to the gripper’s pose uncertainty by smoothing the grasp function with the pose uncertainty function. Synthetic and real experiments demonstrate that the learned grasp score is more robust to gripper pose uncertainty than when this uncertainty is not accounted for.
E Johns, S Leutenegger, AJ Davison. Deep Learning a Grasp Function for Grasping Under Gripper Pose Uncertainty. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016
Elastic Fusion Widget

ElasticFusion: Dense SLAM Without A Pose Graph
Demostration of real time ElasticFusion on an office, hotel and copy dataset
The video above demonstrates the Elastic Fusion system, a novel approach to real-time dense visual SLAM. Our approach applies local model-to-model surface loop closure optimisations to stay close to the mode of the map distribution, while utilising global loop optimisations to recover from arbitrary drift and maintain global consistency.
T Whelan, S Leutenegger, B Glocker, R F. Salas-Moreno, AJ Davison, ElasticFusion: Dense SLAM Without A Pose Graph. Robotics: Science and Systems (RSS), Rome, Italy, July 2015

ElasticFusion: Dense SLAM Without A Pose Graph (extras)
ElasticFusion on seating area, garden, The Burghers of Calais, stairs, MIT-76-417b, loopback dataset
The video above demonstrates the Elastic Fusion system, a novel approach to real-time dense visual SLAM. Our approach applies local model-to-model surface loop closure optimisations to stay close to the mode of the map distribution, while utilising global loop optimisations to recover from arbitrary drift and maintain global consistency.
T Whelan, S Leutenegger, B Glocker, R F. Salas-Moreno, AJ Davison, ElasticFusion: Dense SLAM Without A Pose Graph. Robotics: Science and Systems (RSS), Rome, Italy, July 2015
Getting Robots In The Future To Truly See
Developing robots that can process visual information in real-time could lead to a new range of handy and helpful robots for around the home and in industry. Professor Andrew Davison and Dr Stefan Leutenegger from the Dyson Robotics Lab at Imperial College London discuss the advances they are making in developing robotic vision.