Imperial College London

DrStefanLeutenegger

Faculty of EngineeringDepartment of Computing

Senior Lecturer
 
 
 
//

Contact

 

+44 (0)20 7594 7123s.leutenegger Website

 
 
//

Location

 

360ACE ExtensionSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

55 results found

Zhang K, Chermprayong P, Tzoumanikas D, Li W, Grimm M, Smentoch M, Leutenegger S, Kovac Met al., 2019, Bioinspired design of a landing system with soft shock absorbers for autonomous aerial robots, JOURNAL OF FIELD ROBOTICS, Vol: 36, Pages: 230-251, ISSN: 1556-4959

Journal article

Xu B, Li W, Tzoumanikas D, Bloesch M, Davison A, Leutenegger Set al., MID-fusion: octree-based object-level multi-instance dynamic SLAM, ICRA 2019- IEEE International Conference on Robotics and Automation, Publisher: IEEE

We propose a new multi-instance dynamic RGB-D SLAM system using anobject-level octree-based volumetric representation. It can provide robustcamera tracking in dynamic environments and at the same time, continuouslyestimate geometric, semantic, and motion properties for arbitrary objects inthe scene. For each incoming frame, we perform instance segmentation to detectobjects and refine mask boundaries using geometric and motion information.Meanwhile, we estimate the pose of each existing moving object using anobject-oriented tracking method and robustly track the camera pose against thestatic scene. Based on the estimated camera pose and object poses, we associatesegmented masks with existing models and incrementally fuse correspondingcolour, depth, semantic, and foreground object probabilities into each objectmodel. In contrast to existing approaches, our system is the first system togenerate an object-level dynamic volumetric map from a single RGB-D camera,which can be used directly for robotic tasks. Our method can run at 2-3 Hz on aCPU, excluding the instance segmentation part. We demonstrate its effectivenessby quantitatively and qualitatively testing it on both synthetic and real-worldsequences.

Conference paper

Nicastro A, Clark R, Leutenegger S, 2019, X-Section: cross-section prediction for enhanced RGBD fusion

Detailed 3D reconstruction is an important challenge with application torobotics, augmented and virtual reality, which has seen impressive progressthroughout the past years. Advancements were driven by the availability ofdepth cameras (RGB-D), as well as increased compute power, e.g.\ in the form ofGPUs -- but also thanks to inclusion of machine learning in the process. Here,we propose X-Section, an RGB-D 3D reconstruction approach that leverages deeplearning to make object-level predictions about thicknesses that can be readilyintegrated into a volumetric multi-view fusion process, where we propose anextension to the popular KinectFusion approach. In essence, our method allowsto complete shape in general indoor scenes behind what is sensed by the RGB-Dcamera, which may be crucial e.g.\ for robotic manipulation tasks or efficientscene exploration. Predicting object thicknesses rather than volumes allows usto work with comparably high spatial resolution without exploding memory andtraining data requirements on the employed Convolutional Neural Networks. In aseries of qualitative and quantitative evaluations, we demonstrate how weaccurately predict object thickness and reconstruct general 3D scenescontaining multiple objects.

Working paper

Laidlow T, Czarnowski J, Leutenegger S, DeepFusion: real-time dense 3D reconstruction for monocular SLAM using single-view depth and gradient predictions, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE

While the keypoint-based maps created by sparsemonocular Simultaneous Localisation and Mapping (SLAM)systems are useful for camera tracking, dense 3D recon-structions may be desired for many robotic tasks. Solutionsinvolving depth cameras are limited in range and to indoorspaces, and dense reconstruction systems based on minimisingthe photometric error between frames are typically poorlyconstrained and suffer from scale ambiguity. To address theseissues, we propose a 3D reconstruction system that leverages theoutput of a Convolutional Neural Network (CNN) to producefully dense depth maps for keyframes that include metric scale.Our system, DeepFusion, is capable of producing real-timedense reconstructions on a GPU. It fuses the output of a semi-dense multiview stereo algorithm with the depth and gradientpredictions of a CNN in a probabilistic fashion, using learneduncertainties produced by the network. While the network onlyneeds to be run once per keyframe, we are able to optimise forthe depth map with each new frame so as to constantly makeuse of new geometric constraints. Based on its performanceon synthetic and real world datasets, we demonstrate thatDeepFusion is capable of performing at least as well as othercomparable systems.

Conference paper

Tzoumanikas D, Li W, Grimm M, Zhang K, Kovac M, Leutenegger Set al., 2019, Fully autonomous micro air vehicle flight and landing on a moving target using visual–inertial estimation and model-predictive control, Journal of Field Robotics, Vol: 36, Pages: 49-77, ISSN: 1556-4959

The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) held in spring 2017 was a very successful competition well attended by teams from all over the world. One of the challenges (Challenge 1) required an aerial robot to detect, follow, and land on a moving target in a fully autonomous fashion. In this paper, we present the hardware components of the micro air vehicle (MAV) we built with off the self components alongside the designed algorithms that were developed for the purposes of the competition. We tackle the challenge of landing on a moving target by adopting a generic approach, rather than following one that is tailored to the MBZIRC Challenge 1 setup, enabling easy adaptation to a wider range of applications and targets, even indoors, since we do not rely on availability of global positioning system. We evaluate our system in an uncontrolled outdoor environment where our MAV successfully and consistently lands on a target moving at a speed of up to 5.0 m/s.

Journal article

McCormac J, Clark R, Bloesch M, Davison A, Leutenegger Set al., 2018, Fusion++: Volumetric object-level SLAM, 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), International Conference on, Publisher: IEEE, Pages: 32-41, ISSN: 2378-3826

We propose an online object-level SLAM system which builds a persistent and accurate 3D graph map of arbitrary reconstructed objects. As an RGB-D camera browses a cluttered indoor scene, Mask-RCNN instance segmentations are used to initialise compact per-object Truncated Signed Distance Function (TSDF) reconstructions with object size-dependent resolutions and a novel 3D foreground mask. Reconstructed objects are stored in an optimisable 6DoF pose graph which is our only persistent map representation. Objects are incrementally refined via depth fusion, and are used for tracking, relocalisation and loop closure detection. Loop closures cause adjustments in the relative pose estimates of object instances, but no intra-object warping. Each object also carries semantic information which is refined over time and an existence probability to account for spurious instance predictions. We demonstrate our approach on a hand-held RGB-D sequence from a cluttered office scene with a large number and variety of object instances, highlighting how the system closes loops and makes good use of existing objects on repeated loops. We quantitatively evaluate the trajectory error of our system against a baseline approach on the RGB-D SLAM benchmark, and qualitatively compare reconstruction quality of discovered objects on the YCB video dataset. Performance evaluation shows our approach is highly memory efficient and runs online at 4-8Hz (excluding relocalisation) despite not being optimised at the software level.

Conference paper

Clark R, Bloesch M, Czarnowski J, Leutenegger S, Davison AJet al., 2018, Learning to solve nonlinear least squares for monocular stereo, 15th European Conference on Computer Vision, Publisher: Springer Nature Switzerland AG 2018, Pages: 291-306, ISSN: 0302-9743

Sum-of-squares objective functions are very popular in computer vision algorithms. However, these objective functions are not always easy to optimize. The underlying assumptions made by solvers are often not satisfied and many problems are inherently ill-posed. In this paper, we propose a neural nonlinear least squares optimization algorithm which learns to effectively optimize these cost functions even in the presence of adversities. Unlike traditional approaches, the proposed solver requires no hand-crafted regularizers or priors as these are implicitly learned from the data. We apply our method to the problem of motion stereo ie. jointly estimating the motion and scene geometry from pairs of images of a monocular sequence. We show that our learned optimizer is able to efficiently and effectively solve this challenging optimization problem.

Conference paper

Li W, Saeedi Gharahbolagh S, McCormac J, Clark R, Tzoumanikas D, Ye Q, Tang R, Leutenegger Set al., 2018, InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset, British Machine Vision Conference (BMVC), Publisher: BMVC

Datasets have gained an enormous amount of popularity in the computer vision com-munity, from training and evaluation of Deep Learning-based methods to benchmarkingSimultaneous Localization and Mapping (SLAM). Without a doubt, synthetic imagerybears a vast potential due to scalability in terms of amounts of data obtainable withouttedious manual ground truth annotations or measurements. Here, we present a datasetwith the aim of providing a higher degree of photo-realism, larger scale, more variabil-ity as well as serving a wider range of purposes compared to existing datasets. Ourdataset leverages the availability of millions of professional interior designs and millionsof production-level furniture and object assets – all coming with fine geometric detailsand high-resolution texture. We render high-resolution and high frame-rate video se-quences following realistic trajectories while supporting various camera types as well asproviding inertial measurements. Together with the release of the dataset, we will makeexecutable program of our interactive simulator software as well as our renderer avail-able athttps://interiornetdataset.github.io. To showcase the usabilityand uniqueness of our dataset, we show benchmarking results of both sparse and denseSLAM algorithms.

Conference paper

Li M, Songur N, Orlov P, Leutenegger S, Faisal AAet al., 2018, Towards an embodied semantic fovea: Semantic 3D scene reconstruction from ego-centric eye-tracker videos

Incorporating the physical environment is essential for a completeunderstanding of human behavior in unconstrained every-day tasks. This isespecially important in ego-centric tasks where obtaining 3 dimensionalinformation is both limiting and challenging with the current 2D video analysismethods proving insufficient. Here we demonstrate a proof-of-concept systemwhich provides real-time 3D mapping and semantic labeling of the localenvironment from an ego-centric RGB-D video-stream with 3D gaze pointestimation from head mounted eye tracking glasses. We augment existing work inSemantic Simultaneous Localization And Mapping (Semantic SLAM) with collectedgaze vectors. Our system can then find and track objects both inside andoutside the user field-of-view in 3D from multiple perspectives with reasonableaccuracy. We validate our concept by producing a semantic map from images ofthe NYUv2 dataset while simultaneously estimating gaze position and gazeclasses from recorded gaze data of the dataset images.

Working paper

Clark R, Bloesch M, Czarnowski J, Leutenegger S, Davison Aet al., LS-Net: Learning to Solve Nonlinear Least Squares for Monocular Stereo, European Conference on Computer Vision

Conference paper

Vespa E, Nikolov N, Grimm M, Nardi L, Kelly PH, Leutenegger Set al., 2018, Efficient octree-based volumetric SLAM supporting signed-distance and occupancy mapping, IEEE Robotics and Automation Letters, Vol: 3, Pages: 1144-1151, ISSN: 2377-3766

We present a dense volumetric simultaneous localisation and mapping (SLAM) framework that uses an octree representation for efficient fusion and rendering of either a truncated signed distance field (TSDF) or an occupancy map. The primary aim of this letter is to use one single representation of the environment that can be used not only for robot pose tracking and high-resolution mapping, but seamlessly for planning. We show that our highly efficient octree representation of space fits SLAM and planning purposes in a real-time control loop. In a comprehensive evaluation, we demonstrate dense SLAM accuracy and runtime performance on-par with flat hashing approaches when using TSDF-based maps, and considerable speed-ups when using occupancy mapping compared to standard occupancy maps frameworks. Our SLAM system can run at 10-40 Hz on a modern quadcore CPU, without the need for massive parallelization on a GPU. We, furthermore, demonstrate a probabilistic occupancy mapping as an alternative to TSDF mapping in dense SLAM and show its direct applicability to online motion planning, using the example of informed rapidly-exploring random trees (RRT*).

Journal article

Bloesch M, Czarnowski J, Clark R, Leutenegger S, Davison AJet al., CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM, IEEE Computer Vision and Pattern Recognition 2018, Publisher: IEEE

The representation of geometry in real-time 3D per-ception systems continues to be a critical research issue.Dense maps capture complete surface shape and can beaugmented with semantic labels, but their high dimension-ality makes them computationally costly to store and pro-cess, and unsuitable for rigorous probabilistic inference.Sparse feature-based representations avoid these problems,but capture only partial scene information and are mainlyuseful for localisation only.We present a new compact but dense representation ofscene geometry which is conditioned on the intensity datafrom a single image and generated from a code consistingof a small number of parameters. We are inspired by workboth on learned depth from images, and auto-encoders. Ourapproach is suitable for use in a keyframe-based monoculardense SLAM system: While each keyframe with a code canproduce a depth map, the code can be optimised efficientlyjointlywith pose variables and together with the codes ofoverlapping keyframes to attain global consistency. Condi-tioning the depth map on the image allows the code to onlyrepresent aspects of the local geometry which cannot di-rectly be predicted from the image. We explain how to learnour code representation, and demonstrate its advantageousproperties in monocular SLAM.

Conference paper

Czarnowski J, Leutenegger S, Davison AJ, 2018, Semantic Texture for Robust Dense Tracking, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 851-859, ISSN: 2473-9936

Conference paper

McCormac, Handa A, Leutenegger S, Davison AJet al., 2017, SceneNet RGB-D: Can 5M synthetic images beat generic ImageNet pre-training on indoor segmentation?, International Conference on Computer Vision 2017, Publisher: IEEE, ISSN: 2380-7504

We introduce SceneNet RGB-D, a dataset providing pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection. It also provides perfect camera poses and depth data, allowing investigation into geometric computer vision problems such as optical flow, camera pose estimation, and 3D scene labelling tasks. Random sampling permits virtually unlimited scene configurations, and here we provide 5M rendered RGB-D images from 16K randomly generated 3D trajectories in synthetic layouts, with random but physically simulated object configurations. We compare the semantic segmentation performance of network weights produced from pretraining on RGB images from our dataset against generic VGG-16 ImageNet weights. After fine-tuning on the SUN RGB-D and NYUv2 real-world datasets we find in both cases that the synthetically pre-trained network outperforms the VGG-16 weights. When synthetic pre-training includes a depth channel (something ImageNet cannot natively provide) the performance is greater still. This suggests that large-scale high-quality synthetic RGB datasets with task-specific labels can be more useful for pretraining than real-world generic pre-training such as ImageNet. We host the dataset at http://robotvault. bitbucket.io/scenenet-rgbd.html.

Conference paper

Lukierski R, Leutenegger S, Davison AJ, 2017, Room layout estimation from rapid omnidirectional exploration, IEEE International Conference on Robotics and Automation (ICRA), 2017, Publisher: IEEE

A new generation of practical, low-cost indoor robots is now using wide-angle cameras to aid navigation, but usually this is limited to position estimation via sparse feature-based SLAM. Such robots usually have little global sense of the dimensions, demarcation or identities of the rooms they are in, information which would be very useful to enable behaviour with much more high level intelligence. In this paper we show that we can augment an omni-directional SLAM pipeline with straightforward dense stereo estimation and simple and robust room model fitting to obtain rapid and reliable estimation of the global shape of typical rooms from short robot motions. We have tested our method extensively in real homes, offices and on synthetic data. We also give examples of how our method can extend to making composite maps of larger rooms, and detecting room transitions.

Conference paper

McCormac J, Handa A, Davison AJ, Leutenegger Set al., 2017, SemanticFusion: dense 3D semantic mapping with convolutional neural networks, IEEE International Conference on Robotics and Automation (ICRA), 2017, Publisher: IEEE

Ever more robust, accurate and detailed mapping using visual sensing has proven to be an enabling factor for mobile robots across a wide variety of applications. For the next level of robot intelligence and intuitive user interaction, maps need to extend beyond geometry and appearance — they need to contain semantics. We address this challenge by combining Convolutional Neural Networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondences between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN's semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of ≈25Hz.

Conference paper

McCormac J, Handa A, Davison A, Leutenegger Set al., 2017, SemanticFusion: Dense 3D semantic mapping with convolutional neural networks

© 2017 IEEE. Ever more robust, accurate and detailed mapping using visual sensing has proven to be an enabling factor for mobile robots across a wide variety of applications. For the next level of robot intelligence and intuitive user interaction, maps need to extend beyond geometry and appearance - they need to contain semantics. We address this challenge by combining Convolutional Neural Networks (CNNs) and a state-of-the-art dense Simultaneous Localization and Mapping (SLAM) system, ElasticFusion, which provides long-term dense correspondences between frames of indoor RGB-D video even during loopy scanning trajectories. These correspondences allow the CNN's semantic predictions from multiple view points to be probabilistically fused into a map. This not only produces a useful semantic 3D map, but we also show on the NYUv2 dataset that fusing multiple predictions leads to an improvement even in the 2D semantic labelling over baseline single frame predictions. We also show that for a smaller reconstruction dataset with larger variation in prediction viewpoint, the improvement over single frame segmentation increases. Our system is efficient enough to allow real-time interactive use at frame-rates of ≈25Hz.

Working paper

Laidlow T, Blosch M, Li W, Leutenegger Set al., Dense RGB-D-Inertial SLAM with Map Deformations, IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE

Conference paper

Oettershagen P, Melzer A, Mantel T, Rudin K, Stastny TJ, Wawrzacz B, Hinzmann T, Leutenegger S, Alexis K, Siegwart Ret al., 2017, Design of small hand-launched solar-powered UAVs: From concept study to a multi-day world endurance record flight, Journal of Field Robotics, Vol: 34, Pages: 1352-1377, ISSN: 1556-4967

We present the development process behind AtlantikSolar, a small 6.9 kg hand-launchable low-altitude solar-powered unmanned aerial vehicle (UAV) that recently completed an 81-hour continuous flight and thereby established a new flight endurance world record for all aircraft below 50 kg mass. The goal of our work is to increase the usability of such solar-powered robotic aircraft by maximizing their perpetual flight robustness to meteorological deteriorations such as clouds or winds. We present energetic system models and a design methodology, implement them in our publicly available conceptual design framework for perpetual flight-capable solar-powered UAVs, and finally apply the framework to the AtlantikSolar UAV. We present the detailed AtlantikSolar characteristics as a practical design example. Airframe, avionics, hardware, state estimation, and control method development for autonomous flight operations are described. Flight data are used to validate the conceptual design framework. Flight results from the continuous 81-hour and 2,338 km covered ground distance flight show that AtlantikSolar achieves 39% minimum state-of-charge, 6.8 h excess time and 6.2 h charge margin. These performance metrics are a significant improvement over previous solar-powered UAVs. A performance outlook shows that AtlantikSolar allows perpetual flight in a 6-month window around June 21 at mid-European latitudes, and that multi-day flights with small optical- or infrared-camera payloads are possible for the first time. The demonstrated performance represents the current state-of-the-art in solar-powered low-altitude perpetual flight performance. We conclude with lessons learned from the three-year AtlantikSolar UAV development process and with a sensitivity analysis that identifies the most promising technological areas for future solar-powered UAV performance improvements.

Journal article

Platinsky L, Davison AJ, Leutenegger S, Monocular visual odometry: sparse joint optimisation or densealternation?, IEEE International Conference on Robotics and Automation (ICRA), 2017, Publisher: IEEE

Real-time monocular SLAM is increasingly ma-ture and entering commercial products. However, there is adivide between two techniques providing similar performance.Despite the rise of ‘dense’ and ‘semi-dense’ methods which uselarge proportions of the pixels in a video stream to estimatemotion and structure via alternating estimation, they havenot eradicated feature-based methods which use a significantlysmaller amount of image information from keypoints and retaina more rigorous joint estimation framework. Dense methodsprovide more complete scene information, but in this paperwe focus on how the amount of information and differentoptimisation methods affect the accuracy of local motionestimation (monocular visual odometry). This topic becomesparticularly relevant after the recent results from a direct sparsesystem. We propose a new method for fairly comparing theaccuracy of SLAM frontends in a common setting. We suggestcomputational cost models for an overall comparison whichindicates that there is relative parity between the approaches atthe settings allowed by current serial processors when evaluatedunder equal conditions.

Conference paper

Zienkiewicz J, Tsiotsios C, Davison AJ, Leutenegger Set al., 2016, Monocular, Real-Time Surface Reconstruction using Dynamic Level of Detail, International Conference on 3DVision, Publisher: IEEE

We present a scalable, real-time capable method for robustsurface reconstruction that explicitly handles multiplescales. As a monocular camera browses a scene, ouralgorithm processes images as they arrive and incrementallybuilds a detailed surface model. While most of theexisting reconstruction approaches rely on volumetric orpoint-cloud representations of the environment, we performdepth-map and colour fusion directly into a multi-resolutiontriangular mesh that can be adaptively tessellated usingthe concept of Dynamic Level of Detail. Our method relieson least-squares optimisation, which enables a probabilisticallysound and principled formulation of the fusionalgorithm. We demonstrate that our method is capable ofobtaining high quality, close-up reconstruction, as well ascapturing overall scene geometry, while being memory andcomputationally efficient.

Conference paper

Johns E, Leutenegger S, Davison AJ, 2016, Pairwise Decomposition of Image Sequences for Active Multi-View Recognition, Computer Vision and Pattern Recognition, Publisher: Computer Vision Foundation (CVF), ISSN: 1063-6919

A multi-view image sequence provides a much richercapacity for object recognition than from a single image.However, most existing solutions to multi-view recognitiontypically adopt hand-crafted, model-based geometric methods,which do not readily embrace recent trends in deeplearning. We propose to bring Convolutional Neural Networksto generic multi-view recognition, by decomposingan image sequence into a set of image pairs, classifyingeach pair independently, and then learning an object classi-fier by weighting the contribution of each pair. This allowsfor recognition over arbitrary camera trajectories, withoutrequiring explicit training over the potentially infinite numberof camera paths and lengths. Building these pairwiserelationships then naturally extends to the next-best-viewproblem in an active recognition framework. To achievethis, we train a second Convolutional Neural Network tomap directly from an observed image to next viewpoint.Finally, we incorporate this into a trajectory optimisationtask, whereby the best recognition confidence is sought fora given trajectory length. We present state-of-the-art resultsin both guided and unguided multi-view recognition on theModelNet dataset, and show how our method can be usedwith depth images, greyscale images, or both.

Conference paper

Bardow P, Davison AJ, Leutenegger S, 2016, Simultaneous Optical Flow and Intensity Estimation from an Event Camera, Computer Vision and Pattern Recognition 2016, Publisher: Computer Vision Foundation (CVF), ISSN: 1063-6919

Event cameras are bio-inspired vision sensors whichmimic retinas to measure per-pixel intensity change ratherthan outputting an actual intensity image. This proposedparadigm shift away from traditional frame cameras offerssignificant potential advantages: namely avoiding highdata rates, dynamic range limitations and motion blur.Unfortunately, however, established computer vision algorithmsmay not at all be applied directly to event cameras.Methods proposed so far to reconstruct images, estimateoptical flow, track a camera and reconstruct a scene comewith severe restrictions on the environment or on the motionof the camera, e.g. allowing only rotation. Here, wepropose, to the best of our knowledge, the first algorithm tosimultaneously recover the motion field and brightness image,while the camera undergoes a generic motion throughany scene. Our approach employs minimisation of a costfunction that contains the asynchronous event data as wellas spatial and temporal regularisation within a sliding windowtime interval. Our implementation relies on GPU optimisationand runs in near real-time. In a series of examples,we demonstrate the successful operation of our framework,including in situations where conventional cameras sufferfrom dynamic range limitations and motion blur.

Conference paper

Johns E, Leutenegger S, Davison AJ, 2016, Deep Learning a Grasp Function for Grasping Under Gripper Pose Uncertainty, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE, ISSN: 2153-0866

This paper presents a new method for paralleljawgrasping of isolated objects from depth images, underlarge gripper pose uncertainty. Whilst most approaches aimto predict the single best grasp pose from an image, ourmethod first predicts a score for every possible grasp pose,which we denote the grasp function. With this, it is possibleto achieve grasping robust to the gripper’s pose uncertainty,by smoothing the grasp function with the pose uncertaintyfunction. Therefore, if the single best pose is adjacent to aregion of poor grasp quality, that pose will no longer be chosen,and instead a pose will be chosen which is surrounded by aregion of high grasp quality. To learn this function, we traina Convolutional Neural Network which takes as input a singledepth image of an object, and outputs a score for each grasppose across the image. Training data for this is generated byuse of physics simulation and depth image simulation with 3Dobject meshes, to enable acquisition of sufficient data withoutrequiring exhaustive real-world experiments. We evaluate withboth synthetic and real experiments, and show that the learnedgrasp score is more robust to gripper pose uncertainty thanwhen this uncertainty is not accounted for.

Conference paper

Whelan T, Salas Moreno R, Leutenegger S, Davison A, Glocker Bet al., 2016, Modelling a Three-Dimensional Space, WO2016189274

Patent

Zienkiewicz J, Davison AJ, Leutenegger S, 2016, Real-Time Height Map Fusion using Differentiable Rendering, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE, ISSN: 2153-0866

We present a robust real-time method whichperforms dense reconstruction of high quality height mapsfrom monocular video. By representing the height map as atriangular mesh, and using efficient differentiable renderingapproach, our method enables rigorous incremental probabilisticfusion of standard locally estimated depth and colour intoan immediately usable dense model. We present results forthe application of free space and obstacle mapping by a lowcostrobot, showing that detailed maps suitable for autonomousnavigation can be obtained using only a single forward-lookingcamera.

Conference paper

Whelan T, Salas-Moreno RF, Glocker B, Davison AJ, Leutenegger Set al., 2016, ElasticFusion: real-time dense SLAM and light source estimation, International Journal of Robotics Research, Vol: 35, Pages: 1697-1716, ISSN: 1741-3176

We present a novel approach to real-time dense visual SLAM. Our system is capable of capturing comprehensive dense globallyconsistent surfel-based maps of room scale environments and beyond explored using an RGB-D camera in an incrementalonline fashion, without pose graph optimisation or any post-processing steps. This is accomplished by using dense frame-tomodelcamera tracking and windowed surfel-based fusion coupled with frequent model refinement through non-rigid surfacedeformations. Our approach applies local model-to-model surface loop closure optimisations as often as possible to stay closeto the mode of the map distribution, while utilising global loop closure to recover from arbitrary drift and maintain global consistency.In the spirit of improving map quality as well as tracking accuracy and robustness, we furthermore explore a novelapproach to real-time discrete light source detection. This technique is capable of detecting numerous light sources in indoorenvironments in real-time as a user handheld camera explores the scene. Absolutely no prior information about the scene ornumber of light sources is required. By making a small set of simple assumptions about the appearance properties of the sceneour method can incrementally estimate both the quantity and location of multiple light sources in the environment in an onlinefashion. Our results demonstrate that our technique functions well in many different environments and lighting configurations.We show that this enables (a) more realistic augmented reality (AR) rendering; (b) a richer understanding of the scene beyondpure geometry and; (c) more accurate and robust photometric tracking

Journal article

Kim H, Leutenegger S, Davison AJ, 2016, Real-time 3D reconstruction and 6-DoF tracking with an event camera, ECCV 2016-European Conference on Computer Vision, Publisher: Springer, Pages: 349-364, ISSN: 0302-9743

We propose a method which can perform real-time 3D reconstructionfrom a single hand-held event camera with no additional sensing,and works in unstructured scenes of which it has no prior knowledge.It is based on three decoupled probabilistic filters, each estimating 6-DoFcamera motion, scene logarithmic (log) intensity gradient and scene inversedepth relative to a keyframe, and we build a real-time graph ofthese to track and model over an extended local workspace. We alsoupgrade the gradient estimate for each keyframe into an intensity image,allowing us to recover a real-time video-like intensity sequence withspatial and temporal super-resolution from the low bit-rate input eventstream. To the best of our knowledge, this is the first algorithm provablyable to track a general 6D motion along with reconstruction of arbitrarystructure including its intensity and the reconstruction of grayscale videothat exclusively relies on event camera data.

Conference paper

Lukierski R, Leutenegger S, Davison AJ, 2015, Rapid free-space mapping from a single omnidirectional camera, 2015 European Conference on Mobile Robots (ECMR), Publisher: IEEE, Pages: 1-8

Low-cost robots such as floor cleaners generally rely on limited perception and simple algorithms, but some new models now have enough sensing capability and computation power to enable Simultaneous Localisation And Mapping (SLAM) and intelligent guided navigation. In particular, computer vision is now a serious option in low cost robotics, though its use to date has been limited to feature-based mapping for localisation. Dense environment perception such as free space finding has required additional specialised sensors, adding expense and complexity. Here we show that a robot with a single passive omnidirectional camera can perform rapid global free-space reasoning within typical rooms. Upon entering a new room, the robot makes a circular movement to capture a closely-spaced omni image sequence with disparity in all horizontal directions. feature-based visual SLAM procedure obtains accurate poses for these frames before passing them to a dense matching step, 3D semi-dense reconstruction and visibility reasoning. The result is turned into a 2D occupancy map, which can be improved and extended if necessary through further movement. This rapid, passive technique can capture high quality free space information which gives a robot a global understanding of the space around it. We present results in several scenes, including quantitative comparison with laser-based mapping.

Conference paper

Whelan T, Leutenegger S, Salas-Moreno RF, Glocker B, Davison AJet al., 2015, ElasticFusion: Dense SLAM without a Pose Graph, Robotics: Science and Systems, Publisher: Robotics: Science and Systems, ISSN: 2330-765X

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00825871&limit=30&person=true