CodeSLAM - A Compact, Optimisable Representation for Dense Visual SLAM

CodeSLAM

We present a new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters. Our approach is suitable for use in a keyframe-based monocular dense SLAM system: while each keyframe with a code can produce a depth map, the code can be optimised efficiently jointly with pose variables and together with the codes of overlapping keyframes to attain global consistency. Conditioning the depth map on the image means that the code only needs to represent aspects of the local geometry which cannot directly be predicted from the image. 

Michael Bloesch, Jan Czarnowski, Ronald Clark, Stefan Leutenegger, Andrew J. Davison. CodeSLAM — Learning a Compact, Optimisable Representation for Dense Visual SLAM CVPR, 2018

 

Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task.‌

End-to-End Visuomotor Control

We show how two simple techniques can lead to end-to-end (image to velocity) execution of a multi-stage task that is analogous to a simple tidying routine, without having seen a single real image. Our results show that we are able to successfully accomplish the task in the real world with the ability to generalise to novel environments, including those with novel lighting conditions and distractor objects, and the ability to deal with moving objects, including the basket itself. 

Stephen James, Andrew J Davison and Edward Johns. sjames_etal_corl2017.pdf CoRL, 2017

 

Dense RGB-D-Inertial SLAM with Map Deformations

Dense RGB-D-Inertial SLAM with Map Deformations

The first tightly-coupled dense RGB-D-inertial SLAM system that has real-time capability while running on a GPU. It jointly optimises for the camera pose, velocity, IMU biases and gravity direction while building up a globally consistent, fully dense surfel-based 3D reconstruction of the environment. Our dense visual-inertial SLAM system is more robust to fast motions and periods of low texture and low geometric variation than a related RGB-D-only SLAM system. 

Tristan Laidlow, Michael Bloesch, Wenbin Li and Stefan Leutenegger. Dense RGB-D-Inertial SLAM with Map Deformations. IROS, 2017

Semantic Texture for Robust Dense Tracking

Semantic Texture for Robust Dense Tracking

Robust dense SLAM systems can make valuable use of the layers of features coming from a standard CNN as a pyramid of ‘semantic texture’ which is suitable for dense alignment while being much more robust to nuisance factors such as lighting than raw RGB values. We use a straightforward Lucas-Kanade formulation of image alignment, with a schedule of iterations over the coarse-to-fine levels of a pyramid, and simply replace the usual image pyramid by the hierarchy of convolutional feature maps from a pre-trained CNN. The resulting dense alignment performance is much more robust to lighting and other variations. A selection of a small number of the total set of features output by a CNN can give just as accurate but much more efficient tracking performance.

Jan Czarnowski, Stefan Leutenegger, Andrew J Davison. Semantic Texture for Robust Dense Tracking. Geometry Meets Deep Learning Workshop, ICCV, 2017

Dense 3D Semantic Mapping with Convolutional Neural Networks

SemanticFusion: Dense 3D Semantic Mapping with CNN

We address the challenge of semantic maps by combining Convolutional Neural Networks (CNNs) and a state of the art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondence between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN’s semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of ≈25Hz.

John McCormac, Ankur Handa, Andrew J Davison, Stefan Leutenegger. SemanticFusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. ICRA, 2017

Simultaneously recover the motion field and brightness image, while the event camera undergoes a generic motion through any scene.

Simultaneous Optical Flow and Intensity Estimation

This video demonstrates our algorithm to simultaneously recover the motion field and brightness image, while the event camera undergoes a generic motion through any scene. Our approach employs minimisation of a cost function that contains the asynchronous event data as well as spatial and temporal regularisation within a sliding window time interval. Our implementation relies on GPU optimisation and runs in near real-time. 

P Bardow, AJ Davison, S Leutenegger. Simultaneous Optical Flow and Intensity Estimation from an Event Camera. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 884-892

Real-Time Height Map Fusion

Monocular, Real-Time Surface Reconstruction

This video presents our approach to a scalable, real-time capable method for robust surface reconstruction that explicitly handles multiple scales. We perform depth-map and colour fusion directly into a multi-resolution triangular mesh that can be adaptively tessellated using the concept of Dynamic Level of Detail. Our method is capable of obtaining high quality, close-up reconstruction, as well as capturing overall scene geometry, while being memory and computationally efficient. 

Jacek Zienkiewicz, Akis Tsiotsios, Andrew Davison, Stefan Leutenegger. Monocular, Real-Time Surface Reconstruction using Dynamic Level of Detail. International Conference on 3D Vision (3DV), 2016

 

Monocular, Real-Time Surface Reconstruction

Monocular, Real-Time Surface Reconstruction

Real-Time depth map fusion that is capable of obtaining high detail, close-up reconstruction

This video presents our approach to a scalable, real-time capable method for robust surface reconstruction that explicitly handles multiple scales. We perform depth-map and colour fusion directly into a multi-resolution triangular mesh that can be adaptively tessellated using the concept of Dynamic Level of Detail. Our method is capable of obtaining high quality, close-up reconstruction, as well as capturing overall scene geometry, while being memory and computationally efficient. 

Jacek Zienkiewicz, Akis Tsiotsios, Andrew Davison, Stefan Leutenegger. Monocular, Real-Time Surface Reconstruction using Dynamic Level of Detail. International Conference on 3D Vision (3DV), 2016

 

Real-Time Height Map Fusion using Differentiable Rendering

Real-Time Height Map Fusion using Differentiable Rendering

Real-time method performing dense reconstruction of high quality height maps from monocular video.

This video presents a robust real-time method which performs dense reconstruction of high quality height maps from monocular video. By representing the height map as a triangular mesh, and using efficient differentiable rendering approach, our method enables rigorous incremental probabilistic fusion of standard locally estimated depth and colour into an immediately usable dense model. 

Jacek Zienkiewicz, Andrew J Davison, Stefan Leutenegger. Real-Time Height Map Fusion using Differentiable Rendering. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016 

Deep Learning a Grasp Function for Grasping Under Gripper Pose Uncertainty, IROS2016

Deep Learning for Robot Grasping via Simulation

This video demonstrates our approach to grasping under gripper pose uncertainty.  In this approach, we assign a score for every possible grasp pose allowing us to achieve robustness to the gripper’s pose uncertainty by smoothing the grasp function with the pose uncertainty function. Synthetic and real experiments demonstrate that the learned grasp score is more robust to gripper pose uncertainty than when this uncertainty is not accounted for.

E Johns, S Leutenegger, AJ Davison. Deep Learning a Grasp Function for Grasping Under Gripper Pose Uncertainty. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016

Elastic Fusion Widget

ElasticFusion: Dense SLAM Without A Pose Graph

The video above demonstrates the Elastic Fusion system, a novel approach to real-time dense visual SLAM. Our approach applies local model-to-model surface loop closure optimisations to stay close to the mode of the map distribution, while utilising global loop optimisations to recover from arbitrary drift and maintain global consistency.

T Whelan, S Leutenegger, B Glocker, R F. Salas-Moreno,  AJ Davison, ElasticFusion: Dense SLAM Without A Pose Graph. Robotics: Science and Systems (RSS), Rome, Italy, July 2015

ElasticFusion: Dense SLAM Without A Pose Graph

ElasticFusion: Dense SLAM Without A Pose Graph

Demostration of real time ElasticFusion on an office, hotel and copy dataset

The video above demonstrates the Elastic Fusion system, a novel approach to real-time dense visual SLAM. Our approach applies local model-to-model surface loop closure optimisations to stay close to the mode of the map distribution, while utilising global loop optimisations to recover from arbitrary drift and maintain global consistency.

T Whelan, S Leutenegger, B Glocker, R F. Salas-Moreno,  AJ Davison, ElasticFusion: Dense SLAM Without A Pose Graph. Robotics: Science and Systems (RSS), Rome, Italy, July 2015

ElasticFusion: Dense SLAM Without A Pose Graph (extras)

ElasticFusion: Dense SLAM Without A Pose Graph (extras)

ElasticFusion on seating area, garden, The Burghers of Calais, stairs, MIT-76-417b, loopback dataset

The video above demonstrates the Elastic Fusion system, a novel approach to real-time dense visual SLAM. Our approach applies local model-to-model surface loop closure optimisations to stay close to the mode of the map distribution, while utilising global loop optimisations to recover from arbitrary drift and maintain global consistency.

T Whelan, S Leutenegger, B Glocker, R F. Salas-Moreno,  AJ Davison, ElasticFusion: Dense SLAM Without A Pose GraphRobotics: Science and Systems (RSS), Rome, Italy, July 2015

Getting Robots In The Future To Truly See

Getting Robots In The Future To Truly See

Developing robots that can process visual information in real-time could lead to a new range of handy and helpful robots for around the home and in industry. Professor Andrew Davison and Dr Stefan Leutenegger from the Dyson Robotics Lab at Imperial College London discuss the advances they are making in developing robotic vision.