Imperial College London

DrStefanLeutenegger

Faculty of EngineeringDepartment of Computing

Reader in Robotics
 
 
 
//

Contact

 

s.leutenegger Website

 
 
//

Location

 

ACE ExtensionSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

72 results found

Henning DF, Laidlow T, Leutenegger S, 2022, BodySLAM: Joint Camera Localisation, Mapping, and Human Motion Tracking, Publisher: ArXiv

Estimating human motion from video is an active research area due to its manypotential applications. Most state-of-the-art methods predict human shape andposture estimates for individual images and do not leverage the temporalinformation available in video. Many "in the wild" sequences of human motionare captured by a moving camera, which adds the complication of conflatedcamera and human motion to the estimation. We therefore present BodySLAM, amonocular SLAM system that jointly estimates the position, shape, and postureof human bodies, as well as the camera trajectory. We also introduce a novelhuman motion model to constrain sequential body postures and observe the scaleof the scene. Through a series of experiments on video sequences of humanmotion captured by a moving monocular camera, we demonstrate that BodySLAMimproves estimates of all human body parameters and camera poses when comparedto estimating these separately.

Working paper

Zhi S, Laidlow T, Leutenegger S, Davison AJet al., 2022, In-place scene labelling and understanding with implicit scene representation, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Publisher: IEEE

Semantic labelling is highly correlated with geometry and radiance reconstruction, as scene entities with similar shape and appearance are more likely to come from similar classes. Recent implicit neural reconstruction techniques are appealing as they do not require prior training data, but the same fully self-supervised approach is not possible for semantics because labels are human-defined properties.We extend neural radiance fields (NeRF) to jointly encode semantics with appearance and geometry, so that complete and accurate 2D semantic labels can be achieved using a small amount of in-place annotations specific to the scene. The intrinsic multi-view consistency and smoothness of NeRF benefit semantics by enabling sparse labels to efficiently propagate. We show the benefit of this approach when labels are either sparse or very noisy in room-scale scenes. We demonstrate its advantageous properties in various interesting applications such as an efficient scene labelling tool, novel semantic view synthesis, label denoising, super-resolution, label interpolation and multi-view semantic label fusion in visual semantic mapping systems.

Conference paper

Landgraf Z, Scona R, Laidlow T, James S, Leutenegger S, Davison AJet al., 2022, SIMstack: a generative shape and instance model for unordered object stacks, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Publisher: IEEE

By estimating 3D shape and instances from a single view, we can capture information about an environment quickly, without the need for comprehensive scanning and multi-view fusion. Solving this task for composite scenes (such as object stacks) is challenging: occluded areas are not only ambiguous in shape but also in instance segmentation; multiple decompositions could be valid. We observe that physics constrains decomposition as well as shape in occluded regions and hypothesise that a latent space learned from scenes built under physics simulation can serve as a prior to better predict shape and instances in occluded regions. To this end we propose SIMstack, a depth-conditioned Variational Auto-Encoder (VAE), trained on a dataset of objects stacked under physics simulation. We formulate instance segmentation as a centre voting task which allows for class-agnostic detection and doesn’t require setting the maximum number of objects in the scene. At test time, our model can generate 3D shape and instance segmentation from a single depth view, probabilistically sampling proposals for the occluded region from the learned latent space. Our method has practical applications in providing robots some of the ability humans have to make rapid intuitive inferences of partially observed scenes. We demonstrate an application for precise (non-disruptive) object grasping of unknown objects from a single depth view.

Conference paper

Gallego G, Delbruck T, Orchard GM, Bartolozzi C, Taba B, Censi A, Leutenegger S, Davison A, Conradt J, Daniilidis K, Scaramuzza Det al., 2022, Event-Based Vision: A Survey, Publisher: IEEE COMPUTER SOC

Working paper

Popovic M, Thomas F, Papatheodorou S, Funk N, Vidal-Calleja T, Leutenegger Set al., 2021, Volumetric occupancy mapping with probabilistic depth completion for robotic navigation, Publisher: arXiv

In robotic applications, a key requirement for safe and efficient motionplanning is the ability to map obstacle-free space in unknown, cluttered 3Denvironments. However, commodity-grade RGB-D cameras commonly used for sensingfail to register valid depth values on shiny, glossy, bright, or distantsurfaces, leading to missing data in the map. To address this issue, we proposea framework leveraging probabilistic depth completion as an additional inputfor spatial mapping. We introduce a deep learning architecture providinguncertainty estimates for the depth completion of RGB-D images. Our pipelineexploits the inferred missing depth values and depth uncertainty to complementraw depth images and improve the speed and quality of free space mapping.Evaluations on synthetic data show that our approach maps significantly morecorrect free space with relatively low error when compared against using rawdata alone in different indoor environments; thereby producing more completemaps that can be directly used for robotic navigation tasks. The performance ofour framework is validated using real-world data.

Working paper

Wang Y, Funk N, Ramezani M, Papatheodorou S, Popovic M, Camurri M, Leutenegger S, Fallon Met al., 2020, Elastic and efficient LiDAR reconstruction for large-scale exploration tasks, Publisher: arXiv

We present an efficient, elastic 3D LiDAR reconstruction framework which canreconstruct up to maximum LiDAR ranges (60 m) at multiple frames per second,thus enabling robot exploration in large-scale environments. Our approach onlyrequires a CPU. We focus on three main challenges of large-scalereconstruction: integration of long-range LiDAR scans at high frequency, thecapacity to deform the reconstruction after loop closures are detected, andscalability for long-duration exploration. Our system extends upon astate-of-the-art efficient RGB-D volumetric reconstruction technique, calledsupereight, to support LiDAR scans and a newly developed submapping techniqueto allow for dynamic correction of the 3D reconstruction. We then introduce anovel pose graph sparsification and submap fusion feature to make our systemmore scalable for large environments. We evaluate the performance using apublished dataset captured by a handheld mapping device scanning a set ofbuildings, and with a mobile robot exploring an underground room network.Experimental results demonstrate that our system can reconstruct at 3 Hz with60 m sensor range and ~5 cm resolution, while state-of-the-art approaches canonly reconstruct to 25 cm resolution or 20 m range at the same frequency.

Working paper

Funk N, Tarrio J, Papatheodorou S, Popovic M, Alcantarilla PF, Leutenegger Set al., 2020, Multi-resolution 3D mapping with explicit free space representation for fast and accurate mobile robot motion planning, Publisher: arXiv

With the aim of bridging the gap between high quality reconstruction andmobile robot motion planning, we propose an efficient system that leverages theconcept of adaptive-resolution volumetric mapping, which naturally integrateswith the hierarchical decomposition of space in an octree data structure.Instead of a Truncated Signed Distance Function (TSDF), we adopt mapping ofoccupancy probabilities in log-odds representation, which allows to representboth surfaces, as well as the entire free, i.e. observed space, as opposed tounobserved space. We introduce a method for choosing resolution -- on the fly-- in real-time by means of a multi-scale max-min pooling of the input depthimage. The notion of explicit free space mapping paired with the spatialhierarchy in the data structure, as well as map resolution, allows forcollision queries, as needed for robot motion planning, at unprecedented speed.We quantitatively evaluate mapping accuracy, memory, runtime performance, andplanning performance showing improvements over the state of the art,particularly in cases requiring high resolution maps.

Working paper

Laidlow T, Czarnowski J, Nicastro A, Clark R, Leutenegger Set al., 2020, Towards the probabilistic fusion of learned priors into standard pipelines for 3D reconstruction, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 7373-7379, ISSN: 1050-4729

The best way to combine the results of deep learning with standard 3D reconstruction pipelines remains an open problem. While systems that pass the output of traditional multi-view stereo approaches to a network for regularisation or refinement currently seem to get the best results, it may be preferable to treat deep neural networks as separate components whose results can be probabilistically fused into geometry- based systems. Unfortunately, the error models required to do this type of fusion are not well understood, with many different approaches being put forward. Recently, a few systems have achieved good results by having their networks predict probability distributions rather than single values. We propose using this approach to fuse a learned single-view depth prior into a standard 3D reconstruction system. Our system is capable of incrementally producing dense depth maps for a set of keyframes. We train a deep neural network to predict discrete, nonparametric probability distributions for the depth of each pixel from a single image. We then fuse this "probability volume" with another probability volume based on the photometric consistency between subsequent frames and the keyframe image. We argue that combining the probability volumes from these two sources will result in a volume that is better conditioned. To extract depth maps from the volume, we minimise a cost function that includes a regularisation term based on network predicted surface normals and occlusion boundaries. Through a series of experiments, we demonstrate that each of these components improves the overall performance of the system.

Conference paper

Xu B, Davison AJ, Leutenegger S, 2020, Deep probabilistic feature-metric tracking, Publisher: arXiv

Dense image alignment from RGB-D images remains a critical issue forreal-world applications, especially under challenging lighting conditions andin a wide baseline setting. In this paper, we propose a new framework to learna pixel-wise deep feature map and a deep feature-metric uncertainty mappredicted by a Convolutional Neural Network (CNN), which together formulate adeep probabilistic feature-metric residual of the two-view constraint that canbe minimised using Gauss-Newton in a coarse-to-fine optimisation framework.Furthermore, our network predicts a deep initial pose for faster and morereliable convergence. The optimisation steps are differentiable and unrolled totrain in an end-to-end fashion. Due to its probabilistic essence, our approachcan easily couple with other residuals, where we show a combination with ICP.Experimental results demonstrate state-of-the-art performance on the TUM RGB-Ddataset and 3D rigid object tracking dataset. We further demonstrate ourmethod's robustness and convergence qualitatively.

Working paper

Tzoumanikas D, Graule F, Yan Q, Shah D, Popovic M, Leutenegger Set al., 2020, Aerial manipulation using hybrid force and position NMPC applied to aerial writing, Publisher: arXiv

Aerial manipulation aims at combining the manoeuvrability of aerial vehicleswith the manipulation capabilities of robotic arms. This, however, comes at thecost of the additional control complexity due to the coupling of the dynamicsof the two systems. In this paper we present a NMPC specifically designed forMAVs equipped with a robotic arm. We formulate a hybrid control model for thecombined MAV-arm system which incorporates interaction forces acting on the endeffector. We explain the practical implementation of our algorithm and showextensive experimental results of our custom built system performing multipleaerial-writing tasks on a whiteboard, revealing accuracy in the order ofmillimetres.

Working paper

Ortiz J, Pupilli M, Leutenegger S, Davison AJet al., 2020, Bundle adjustment on a graph processor, Publisher: arXiv

Graph processors such as Graphcore's Intelligence Processing Unit (IPU) arepart of the major new wave of novel computer architecture for AI, and have ageneral design with massively parallel computation, distributed on-chip memoryand very high inter-core communication bandwidth which allows breakthroughperformance for message passing algorithms on arbitrary graphs. We show for thefirst time that the classical computer vision problem of bundle adjustment (BA)can be solved extremely fast on a graph processor using Gaussian BeliefPropagation. Our simple but fully parallel implementation uses the 1216 coreson a single IPU chip to, for instance, solve a real BA problem with 125keyframes and 1919 points in under 40ms, compared to 1450ms for the Ceres CPUlibrary. Further code optimisation will surely increase this difference onstatic problems, but we argue that the real promise of graph processing is forflexible in-place optimisation of general, dynamically changing factor graphsrepresenting Spatial AI problems. We give indications of this with experimentsshowing the ability of GBP to efficiently solve incremental SLAM problems, anddeal with robust cost functions and different types of factors.

Working paper

Bloesch M, Laidlow T, Clark R, Leutenegger S, Davison Aet al., 2020, Learning meshes for dense visual SLAM, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Publisher: IEEE

Estimating motion and surrounding geometry of a moving camera remains a challenging inference problem. From an information theoretic point of view, estimates should get better as more information is included, such as is done in dense SLAM, but this is strongly dependent on the validity of the underlying models. In the present paper, we use triangular meshes as both compact and dense geometry representation. To allow for simple and fast usage, we propose a view-based formulation for which we predict the in-plane vertex coordinates directly from images and then employ the remaining vertex depth components as free variables. Flexible and continuous integration of information is achieved through the use of a residual based inference technique. This so-called factor graph encodes all information as mapping from free variables to residuals, the squared sum of which is minimised during inference. We propose the use of different types of learnable residuals, which are trained end-to-end to increase their suitability as information bearing models and to enable accurate and reliable estimation. Detailed evaluation of all components is provided on both synthetic and real data which confirms the practicability of the presented approach.

Conference paper

Landgraf Z, Falck F, Bloesch M, Leutenegger S, Davison Aet al., 2020, Comparing view-based and map-based semantic labelling in real-time SLAM, Publisher: arXiv

Generally capable Spatial AI systems must build persistent scenerepresentations where geometric models are combined with meaningful semanticlabels. The many approaches to labelling scenes can be divided into two cleargroups: view-based which estimate labels from the input view-wise data and thenincrementally fuse them into the scene model as it is built; and map-basedwhich label the generated scene model. However, there has so far been noattempt to quantitatively compare view-based and map-based labelling. Here, wepresent an experimental framework and comparison which uses real-time heightmap fusion as an accessible platform for a fair comparison, opening up theroute to further systematic research in this area.

Working paper

Bonde U, Alcantarilla PF, Leutenegger S, 2020, Towards bounding-box free panoptic segmentation, Publisher: arXiv

In this work we introduce a new bounding-box free network (BBFNet) forpanoptic segmentation. Panoptic segmentation is an ideal problem for abounding-box free approach as it already requires per-pixel semantic classlabels. We use this observation to exploit class boundaries from anoff-the-shelf semantic segmentation network and refine them to predict instancelabels. Towards this goal BBFNet predicts coarse watershed levels and use it todetect large instance candidates where boundaries are well defined. For smallerinstances, whose boundaries are less reliable, BBFNet also predicts instancecenters by means of Hough voting followed by mean-shift to reliably detectsmall objects. A novel triplet loss network helps merging fragmented instanceswhile refining boundary pixels. Our approach is distinct from previous works inpanoptic segmentation that rely on a combination of a semantic segmentationnetwork with a computationally costly instance segmentation network based onbounding boxes, such as Mask R-CNN, to guide the prediction of instance labelsusing a Mixture-of-Expert (MoE) approach. We benchmark our non-MoE method onCityscapes and Microsoft COCO datasets and show competitive performance withother MoE based approaches while outperfroming exisiting non-proposal basedapproaches. We achieve this while been computationally more efficient in termsof number of parameters and FLOPs. Video results are provided herehttps://blog.slamcore.com/reducing-the-cost-of-understanding.

Working paper

Tzoumanikas D, Yan Q, Leutenegger S, 2020, Nonlinear MPC with motor failure identification and recovery for safe and aggressive multicopter flight, Publisher: arXiv

Safe and precise reference tracking is a crucial characteristic of MAVs thathave to operate under the influence of external disturbances in clutteredenvironments. In this paper, we present a NMPC that exploits the fully physicsbased non-linear dynamics of the system. We furthermore show how the moment andthrust control inputs can be transformed into feasible actuator commands. Inorder to guarantee safe operation despite potential loss of a motor under whichwe show our system keeps operating safely, we developed an EKF based motorfailure identification algorithm. We verify the effectiveness of the developedpipeline in flight experiments with and without motor failures.

Working paper

Dai A, Papatheodorou S, Funk N, Tzoumanikas D, Leutenegger Set al., 2020, Fast frontier-based information-driven autonomous exploration with an MAV, Publisher: arXiv

Exploration and collision-free navigation through an unknown environment is afundamental task for autonomous robots. In this paper, a novel explorationstrategy for Micro Aerial Vehicles (MAVs) is presented. The goal of theexploration strategy is the reduction of map entropy regarding occupancyprobabilities, which is reflected in a utility function to be maximised. Weachieve fast and efficient exploration performance with tight integrationbetween our octree-based occupancy mapping approach, frontier extraction, andmotion planning-as a hybrid between frontier-based and sampling-basedexploration methods. The computationally expensive frontier clustering employedin classic frontier-based exploration is avoided by exploiting the implicitgrouping of frontier voxels in the underlying octree map representation.Candidate next-views are sampled from the map frontiers and are evaluated usinga utility function combining map entropy and travel time, where the former iscomputed efficiently using sparse raycasting. These optimisations along withthe targeted exploration of frontier-based methods result in a fast andcomputationally efficient exploration planner. The proposed method is evaluatedusing both simulated and real-world experiments, demonstrating clear advantagesover state-of-the-art approaches.

Working paper

Vespa E, Funk N, Kelly PHJ, Leutenegger Set al., 2019, Adaptive-resolution octree-based volumetric SLAM, 7th International Conference on 3D Vision (3DV), Publisher: IEEE COMPUTER SOC, Pages: 654-662, ISSN: 2378-3826

We introduce a novel volumetric SLAM pipeline for the integration and rendering of depth images at an adaptive level of detail. Our core contribution is a fusion algorithm which dynamically selects the appropriate integration scale based on the effective sensor resolution given the distance from the observed scene, addressing aliasing issues, reconstruction quality, and efficiency simultaneously. We implement our approach using an efficient octree structure which supports multi-resolution rendering allowing for online frame-to-model alignment. Our qualitative and quantitative experiments demonstrate significantly improved reconstruction quality and up to six-fold execution time speed-ups compared to single resolution grids.

Conference paper

Houscago C, Bloesch M, Leutenegger S, 2019, KO-Fusion: dense visual SLAM with tightly-coupled kinematic and odometric tracking, International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 4054-4060, ISSN: 1050-4729

Dense visual SLAM methods are able to estimate the 3D structure of an environment and locate the observer within them. They estimate the motion of a camera by matching visual information between consecutive frames, and are thus prone to failure under extreme motion conditions or when observing texture-poor regions. The integration of additional sensor modalities has shown great promise in improving the robustness and accuracy of such SLAM systems. In contrast to the popular use of inertial measurements we propose to tightly-couple a dense RGB-D SLAM system with kinematic and odometry measurements from a wheeled robot equipped with a manipulator. The system has real-time capability while running on GPU. It optimizes the camera pose by considering the geometric alignment of the map as well as kinematic and odometric data from the robot. Through experimentation in the real-world, we show that the system is more robust to challenging trajectories featuring fast and loopy motion than the equivalent system without the additional kinematic and odometric knowledge, whilst retaining comparable performance to the equivalent RGB-D only system on easy trajectories.

Conference paper

Laidlow T, Czarnowski J, Leutenegger S, 2019, DeepFusion: real-time dense 3D reconstruction for monocular SLAM using single-view depth and gradient predictions, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 4068-4074, ISSN: 2577-087X

While the keypoint-based maps created by sparsemonocular Simultaneous Localisation and Mapping (SLAM)systems are useful for camera tracking, dense 3D recon-structions may be desired for many robotic tasks. Solutionsinvolving depth cameras are limited in range and to indoorspaces, and dense reconstruction systems based on minimisingthe photometric error between frames are typically poorlyconstrained and suffer from scale ambiguity. To address theseissues, we propose a 3D reconstruction system that leverages theoutput of a Convolutional Neural Network (CNN) to producefully dense depth maps for keyframes that include metric scale.Our system, DeepFusion, is capable of producing real-timedense reconstructions on a GPU. It fuses the output of a semi-dense multiview stereo algorithm with the depth and gradientpredictions of a CNN in a probabilistic fashion, using learneduncertainties produced by the network. While the network onlyneeds to be run once per keyframe, we are able to optimise forthe depth map with each new frame so as to constantly makeuse of new geometric constraints. Based on its performanceon synthetic and real world datasets, we demonstrate thatDeepFusion is capable of performing at least as well as othercomparable systems.

Conference paper

Xu B, Li W, Tzoumanikas D, Bloesch M, Davison A, Leutenegger Set al., 2019, MID-fusion: octree-based object-level multi-instance dynamic SLAM, ICRA 2019- IEEE International Conference on Robotics and Automation, Publisher: IEEE

We propose a new multi-instance dynamic RGB-D SLAM system using anobject-level octree-based volumetric representation. It can provide robustcamera tracking in dynamic environments and at the same time, continuouslyestimate geometric, semantic, and motion properties for arbitrary objects inthe scene. For each incoming frame, we perform instance segmentation to detectobjects and refine mask boundaries using geometric and motion information.Meanwhile, we estimate the pose of each existing moving object using anobject-oriented tracking method and robustly track the camera pose against thestatic scene. Based on the estimated camera pose and object poses, we associatesegmented masks with existing models and incrementally fuse correspondingcolour, depth, semantic, and foreground object probabilities into each objectmodel. In contrast to existing approaches, our system is the first system togenerate an object-level dynamic volumetric map from a single RGB-D camera,which can be used directly for robotic tasks. Our method can run at 2-3 Hz on aCPU, excluding the instance segmentation part. We demonstrate its effectivenessby quantitatively and qualitatively testing it on both synthetic and real-worldsequences.

Conference paper

Saeedi S, Carvalho EDC, Li W, Tzoumanikas D, Leutenegger S, Kelly PHJ, Davison AJet al., 2019, Characterizing visual localization and mapping datasets, 2019 International Conference on Robotics and Automation (ICRA), Publisher: Institute of Electrical and Electronics Engineers, ISSN: 1050-4729

Benchmarking mapping and motion estimation algorithms is established practice in robotics and computer vision. As the diversity of datasets increases, in terms of the trajectories, models, and scenes, it becomes a challenge to select datasets for a given benchmarking purpose. Inspired by the Wasserstein distance, this paper addresses this concern by developing novel metrics to evaluate trajectories and the environments without relying on any SLAM or motion estimation algorithm. The metrics, which so far have been missing in the research community, can be applied to the plethora of datasets that exist. Additionally, to improve the robotics SLAM benchmarking, the paper presents a new dataset for visual localization and mapping algorithms. A broad range of real-world trajectories is used in very high-quality scenes and a rendering framework to create a set of synthetic datasets with ground-truth trajectory and dense map which are representative of key SLAM applications such as virtual reality (VR), micro aerial vehicle (MAV) flight, and ground robotics.

Conference paper

Nicastro A, Clark R, Leutenegger S, 2019, X-Section: cross-section prediction for enhanced RGBD fusion

Detailed 3D reconstruction is an important challenge with application torobotics, augmented and virtual reality, which has seen impressive progressthroughout the past years. Advancements were driven by the availability ofdepth cameras (RGB-D), as well as increased compute power, e.g.\ in the form ofGPUs -- but also thanks to inclusion of machine learning in the process. Here,we propose X-Section, an RGB-D 3D reconstruction approach that leverages deeplearning to make object-level predictions about thicknesses that can be readilyintegrated into a volumetric multi-view fusion process, where we propose anextension to the popular KinectFusion approach. In essence, our method allowsto complete shape in general indoor scenes behind what is sensed by the RGB-Dcamera, which may be crucial e.g.\ for robotic manipulation tasks or efficientscene exploration. Predicting object thicknesses rather than volumes allows usto work with comparably high spatial resolution without exploding memory andtraining data requirements on the employed Convolutional Neural Networks. In aseries of qualitative and quantitative evaluations, we demonstrate how weaccurately predict object thickness and reconstruct general 3D scenescontaining multiple objects.

Working paper

Tzoumanikas D, Li W, Grimm M, Zhang K, Kovac M, Leutenegger Set al., 2019, Fully autonomous micro air vehicle flight and landing on a moving target using visual–inertial estimation and model-predictive control, Journal of Field Robotics, Vol: 36, Pages: 49-77, ISSN: 1556-4959

The Mohamed Bin Zayed International Robotics Challenge (MBZIRC) held in spring 2017 was a very successful competition well attended by teams from all over the world. One of the challenges (Challenge 1) required an aerial robot to detect, follow, and land on a moving target in a fully autonomous fashion. In this paper, we present the hardware components of the micro air vehicle (MAV) we built with off the self components alongside the designed algorithms that were developed for the purposes of the competition. We tackle the challenge of landing on a moving target by adopting a generic approach, rather than following one that is tailored to the MBZIRC Challenge 1 setup, enabling easy adaptation to a wider range of applications and targets, even indoors, since we do not rely on availability of global positioning system. We evaluate our system in an uncontrolled outdoor environment where our MAV successfully and consistently lands on a target moving at a speed of up to 5.0 m/s.

Journal article

Zhang K, Chermprayong P, Tzoumanikas D, Li W, Grimm M, Smentoch M, Leutenegger S, Kovac Met al., 2019, Bioinspired design of a landing system with soft shock absorbers for autonomous aerial robots, Journal of Field Robotics, Vol: 36, Pages: 230-251, ISSN: 1556-4959

One of the main challenges for autonomous aerial robots is to land safely on a target position on varied surface structures in real‐world applications. Most of current aerial robots (especially multirotors) use only rigid landing gears, which limit the adaptability to environments and can cause damage to the sensitive cameras and other electronics onboard. This paper presents a bioinpsired landing system for autonomous aerial robots, built on the inspire–abstract–implement design paradigm and an additive manufacturing process for soft thermoplastic materials. This novel landing system consists of 3D printable Sarrus shock absorbers and soft landing pads which are integrated with an one‐degree‐of‐freedom actuation mechanism. Both designs of the Sarrus shock absorber and the soft landing pad are analyzed via finite element analysis, and are characterized with dynamic mechanical measurements. The landing system with 3D printed soft components is characterized by completing landing tests on flat, convex, and concave steel structures and grassy field in a total of 60 times at different speeds between 1 and 2 m/s. The adaptability and shock absorption capacity of the proposed landing system is then evaluated and benchmarked against rigid legs. It reveals that the system is able to adapt to varied surface structures and reduce impact force by 540N at maximum. The bioinspired landing strategy presented in this paper opens a promising avenue in Aerial Biorobotics, where a cross‐disciplinary approach in vehicle control and navigation is combined with soft technologies, enabled with adaptive morphology.

Journal article

Zhi S, Bloesch M, Leutenegger S, Davison AJet al., 2019, SceneCode: Monocular Dense Semantic Reconstruction using Learned Encoded Scene Representations, Publisher: IEEE

Working paper

Bloesch M, Czarnowski J, Clark R, Leutenegger S, Davison AJet al., 2018, CodeSLAM - Learning a compact, optimisable representation for dense visual SLAM, IEEE Computer Vision and Pattern Recognition 2018, Publisher: IEEE, Pages: 2560-2568

The representation of geometry in real-time 3D perception systems continues to be a critical research issue. Dense maps capture complete surface shape and can be augmented with semantic labels, but their high dimensionality makes them computationally costly to store and process, and unsuitable for rigorous probabilistic inference. Sparse feature-based representations avoid these problems, but capture only partial scene information and are mainly useful for localisation only. We present a new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters. We are inspired by work both on learned depth from images, and auto-encoders. Our approach is suitable for use in a keyframe-based monocular dense SLAM system: While each keyframe with a code can produce a depth map, the code can be optimised efficiently jointly with pose variables and together with the codes of overlapping keyframes to attain global consistency. Conditioning the depth map on the image allows the code to only represent aspects of the local geometry which cannot directly be predicted from the image. We explain how to learn our code representation, and demonstrate its advantageous properties in monocular SLAM.

Conference paper

McCormac J, Clark R, Bloesch M, Davison A, Leutenegger Set al., 2018, Fusion++: Volumetric object-level SLAM, 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT), International Conference on, Publisher: IEEE, Pages: 32-41, ISSN: 2378-3826

We propose an online object-level SLAM system which builds a persistent and accurate 3D graph map of arbitrary reconstructed objects. As an RGB-D camera browses a cluttered indoor scene, Mask-RCNN instance segmentations are used to initialise compact per-object Truncated Signed Distance Function (TSDF) reconstructions with object size-dependent resolutions and a novel 3D foreground mask. Reconstructed objects are stored in an optimisable 6DoF pose graph which is our only persistent map representation. Objects are incrementally refined via depth fusion, and are used for tracking, relocalisation and loop closure detection. Loop closures cause adjustments in the relative pose estimates of object instances, but no intra-object warping. Each object also carries semantic information which is refined over time and an existence probability to account for spurious instance predictions. We demonstrate our approach on a hand-held RGB-D sequence from a cluttered office scene with a large number and variety of object instances, highlighting how the system closes loops and makes good use of existing objects on repeated loops. We quantitatively evaluate the trajectory error of our system against a baseline approach on the RGB-D SLAM benchmark, and qualitatively compare reconstruction quality of discovered objects on the YCB video dataset. Performance evaluation shows our approach is highly memory efficient and runs online at 4-8Hz (excluding relocalisation) despite not being optimised at the software level.

Conference paper

Clark R, Bloesch M, Czarnowski J, Leutenegger S, Davison AJet al., 2018, Learning to solve nonlinear least squares for monocular stereo, 15th European Conference on Computer Vision, Publisher: Springer Nature Switzerland AG 2018, Pages: 291-306, ISSN: 0302-9743

Sum-of-squares objective functions are very popular in computer vision algorithms. However, these objective functions are not always easy to optimize. The underlying assumptions made by solvers are often not satisfied and many problems are inherently ill-posed. In this paper, we propose a neural nonlinear least squares optimization algorithm which learns to effectively optimize these cost functions even in the presence of adversities. Unlike traditional approaches, the proposed solver requires no hand-crafted regularizers or priors as these are implicitly learned from the data. We apply our method to the problem of motion stereo ie. jointly estimating the motion and scene geometry from pairs of images of a monocular sequence. We show that our learned optimizer is able to efficiently and effectively solve this challenging optimization problem.

Conference paper

Clark R, Bloesch M, Czarnowski J, Leutenegger S, Davison AJet al., 2018, LS-Net: Learning to Solve Nonlinear Least Squares for Monocular Stereo

Sum-of-squares objective functions are very popular in computer visionalgorithms. However, these objective functions are not always easy to optimize.The underlying assumptions made by solvers are often not satisfied and manyproblems are inherently ill-posed. In this paper, we propose LS-Net, a neuralnonlinear least squares optimization algorithm which learns to effectivelyoptimize these cost functions even in the presence of adversities. Unliketraditional approaches, the proposed solver requires no hand-craftedregularizers or priors as these are implicitly learned from the data. We applyour method to the problem of motion stereo ie. jointly estimating the motionand scene geometry from pairs of images of a monocular sequence. We show thatour learned optimizer is able to efficiently and effectively solve thischallenging optimization problem.

Conference paper

Li W, Saeedi S, McCormac J, Clark R, Tzoumanikas D, Ye Q, Huang Y, Tang R, Leutenegger Set al., 2018, InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset

Datasets have gained an enormous amount of popularity in the computer visioncommunity, from training and evaluation of Deep Learning-based methods tobenchmarking Simultaneous Localization and Mapping (SLAM). Without a doubt,synthetic imagery bears a vast potential due to scalability in terms of amountsof data obtainable without tedious manual ground truth annotations ormeasurements. Here, we present a dataset with the aim of providing a higherdegree of photo-realism, larger scale, more variability as well as serving awider range of purposes compared to existing datasets. Our dataset leveragesthe availability of millions of professional interior designs and millions ofproduction-level furniture and object assets -- all coming with fine geometricdetails and high-resolution texture. We render high-resolution and highframe-rate video sequences following realistic trajectories while supportingvarious camera types as well as providing inertial measurements. Together withthe release of the dataset, we will make executable program of our interactivesimulator software as well as our renderer available athttps://interiornetdataset.github.io. To showcase the usability and uniquenessof our dataset, we show benchmarking results of both sparse and dense SLAMalgorithms.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00825871&limit=30&person=true