Publications

Murai R, Alzugaray I, Kelly PHJ, Davison AJet al., 2024, Distributed simultaneous localisation and auto-calibration using Gaussian belief propagation, IEEE Robotics and Automation Letters, Vol: 9, Pages: 2136-2143, ISSN: 2377-3766

We present a novel scalable, fully distributed, and online method for simultaneous localisation and extrinsic calibration for multi-robot setups. Individual a priori unknown robot poses are probabilistically inferred as robots sense each other while simultaneously calibrating their sensors and markers extrinsic using Gaussian Belief Propagation. In the presented experiments, we show how our method not only yields accurate robot localisation and auto-calibration but also is able to perform under challenging circumstances such as highly noisy measurements, significant communication failures or limited communication range.

Journal article

Murai R, Ortiz J, Saeedi S, Kelly PHJ, Davison AJet al., 2024, A robot web for distributed many-device localization, IEEE Transactions on Robotics, Vol: 40, Pages: 121-138, ISSN: 1552-3098

We show that a distributed network of robots or other devices which make measurements of each other can collaborate to globally localize via efficient ad hoc peer-to-peer communication. Our Robot Web solution is based on Gaussian belief propagation (GBP) on the fundamental nonlinear factor graph describing the probabilistic structure of all of the observations robots make internally or of each other, and is flexible for any type of robot, motion or sensor. We define a simple and efficient communication protocol which can be implemented by the publishing and reading of web pages or other asynchronous communication technologies. We show in simulations with up to 1000 robots interacting in arbitrary patterns that our solution convergently achieves global accuracy as accurate as a centralized nonlinear factor graph solver while operating with high distributed efficiency of computation and communication. Via the use of robust factors in GBP, our method is tolerant to a high percentage of faulty sensor measurements or dropped communication packets. Furthermore, we showcase that the system operates on real robots with limited onboard computational resources.

Journal article

Mazur K, Sucar E, Davison AJ, 2023, Feature-Realistic Neural Fusion for Real-Time, Open Set Scene Understanding, 2023 IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 8201-8207

General scene understanding for robotics requires flexible semantic representation, so that novel objects and structures which may not have been known at training time can be identified, segmented and grouped. We present an algorithm which fuses general learned features from a standard pre-trained network into a highly efficient 3D geometric neural field representation during real-time SLAM. The fused 3D feature maps inherit the coherence of the neural field's geometry representation. This means that tiny amounts of human labelling interacting at runtime enable objects or even parts of objects to be robustly and accurately segmented in an open set manner. Project page: https://makezur.github.io/FeatureRealisticFusion/

Conference paper

Patwardhan A, Murai R, Davison AJ, 2023, Distributing collaborative multi-robot planning with Gaussian belief propagation, IEEE Robotics and Automation Letters, Vol: 8, Pages: 552-559, ISSN: 2377-3766

Precise coordinated planning over a forward time window enables safe and highly efficient motion when many robots must work together in tight spaces, but this would normally require centralised control of all devices which is difficult to scale. We demonstrate GBP Planning, a new purely distributed technique based on Gaussian Belief Propagation for multi-robot planning problems, formulated by a generic factor graph defining dynamics and collision constraints over a forward time window. In simulations, we show that our method allows high performance collaborative planning where robots are able to cross each other in busy, intricate scenarios. They maintain shorter, quicker and smoother trajectories than alternative distributed planning techniques even in cases of communication failure. We encourage the reader to view the accompanying video demonstration.

Journal article

Zhi S, Sucar E, Mouton A, Haughton I, Laidlow T, Davison AJet al., 2023, iLabel: revealing objects in neural fields, IEEE ROBOTICS AND AUTOMATION LETTERS, Vol: 8, Pages: 832-839, ISSN: 2377-3766

A neural field trained with self-supervision to efficiently represent the geometry and colour of a 3D scene tends to automatically decompose it into coherent and accurate object-like regions, which can be revealed with sparse labelling interactions to produce a 3D semantic scene segmentation. Our real-time iLabel system takes input from a hand-held RGB-D camera, requires zero prior training data, and works in an ‘open set’ manner, with semantic classes defined on the fly by the user. iLabel's underlying model is a simple multilayer perceptron (MLP), trained from scratch to learn a neural representation of a single 3D scene. The model is updated continually and visualised in real-time, allowing the user to focus interactions to achieve extremely efficient semantic segmentation. A room-scale scene can be accurately labelled into 10+ semantic categories with around 100 clicks, taking less than 5 minutes. Quantitative labelling accuracy scales powerfully with the number of clicks, and rapidly surpasses standard pre-trained semantic segmentation methods. We also demonstrate a hierarchical labelling variant of iLabel and a ‘hands-free’ mode where the user only needs to supply label names for automatically-generated locations.

Journal article

Dexheimer E, Davison AJ, 2023, Learning a Depth Covariance Function, Pages: 13122-13131, ISSN: 1063-6919

We propose learning a depth covariance function with applications to geometric vision tasks. Given RGB images as input, the covariance function can be flexibly used to define priors over depth functions, predictive distributions given observations, and methods for active point selection. We leverage these techniques for a selection of downstream tasks: depth completion, bundle adjustment, and monocular dense visual odometry.

Abstract
Cite

Conference paper

Kong X, Liu S, Taher M, Davison AJet al., 2023, vMAP: Vectorised Object Mapping for Neural Field SLAM, Pages: 952-961, ISSN: 1063-6919

We present vMAP, an object-level dense SLAM system using neural field representations. Each object is repre-sented by a small MLP, enabling efficient, watertight object modelling without the needfor 3D priors. As an RGB-D camera browses a scene with no prior in-formation, vMAP detects object instances on-the-fly, and dynamically adds them to its map. Specifically, thanks to the power of vectorised training, vMAP can optimise as many as 50 individual objects in a single scene, with an extremely efficient training speed of 5Hz map update. We experimentally demonstrate significantly improved scene-level and object-level reconstruction quality compared to prior neural field SLAM systems. Project page: https://kxhit.github.io/vMAP.

Abstract
Cite
Citations: 6

Conference paper

Haughton I, Sucar E, Mouton A, Johns E, Davison AJet al., 2023, Real-time mapping of physical scene properties with an autonomous robot experimenter, 6th Conference on Robot Learning, Pages: 118-127

Neural fields can be trained from scratch to represent the shape and appearance of 3D scenes efficiently. It has also been shown that they can densely map correlated properties such as semantics, via sparse interactions from a human labeller. In this work, we show that a robot can densely annotate a scene with arbitrary discrete or continuous physical properties via its own fully-autonomous experimental interactions, as it simultaneously scans and maps it with an RGB-D camera. A variety of scene interactions are possible, including poking with force sensing to determine rigidity, measuring local material type with single-pixel spectroscopy or predicting force distributions by pushing. Sparse experimental interactions are guided by entropy to enable high efficiency, with tabletop scene properties densely mapped from scratch in a few minutes from a few tens of interactions.

Conference paper

Matsuki H, Sucar E, Laidow T, Wada K, Scona R, Davison AJet al., 2023, IMODE:Real-Time Incremental Monocular Dense Mapping Using Neural Field, Pages: 4171-4177, ISSN: 1050-4729

We present a novel real-time dense and semantic neural field mapping system that uses only monocular images as input. Our scene representation is a dense continuous radiance field represented by a Multi-Layer Perceptron (MLP), trained from scratch in real-time. We build on high-performance sparse visual SLAM and use camera poses and sparse keypoint depths as supervision alongside RGB keyframes. Since no prior training is required, our system flexibly fits to arbitrary scale and structure at runtime, and works even with strong specular reflections. We demonstrate reconstruction over a range of scenes from small indoor to large outdoor spaces. We also show that the method can straightforwardly benefit from additional inputs such as learned depth priors or semantic labels for more precise and advanced mapping.

Abstract
Cite
Citations: 1

Conference paper

Xu B, Davison AJ, Leutenegger S, 2022, Learning to complete object shapes for object-level mapping in dynamic scenes, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 2257-2264

In this paper, we propose a novel object-level mapping system that can simultaneously segment, track, and reconstruct objects in dynamic scenes. It can further predict and complete their full geometries by conditioning on reconstructions from depth inputs and a category-level shape prior with the aim that completed object geometry leads to better object reconstruction and tracking accuracy. For each incoming RGB-D frame, we perform instance segmentation to detect objects and build data associations between the detection and the existing object maps. A new object map will be created for each unmatched detection. For each matched object, we jointly optimise its pose and latent geometry representations using geometric residual and differential rendering residual towards its shape prior and completed geometry. Our approach shows better tracking and reconstruction performance compared to methods using traditional volumetric mapping or learned shape prior approaches. We evaluate its effectiveness by quantitatively and qualitatively testing it in both synthetic and real-world sequences.

Conference paper

James S, Wada K, Laidlow T, Davison AJet al., 2022, Coarse-to-fine Q-attention: efficient learning for visual robotic manipulation via discretisation, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE COMPUTER SOC, Pages: 13729-13738, ISSN: 1063-6919

We present a coarse-to-fine discretisation method that enables the use of discrete reinforcement learning approaches in place of unstable and data-inefficient actorcritic methods in continuous robotics domains. This approach builds on the recently released ARM algorithm, which replaces the continuous next-best pose agent with a discrete one, with coarse-to-fine Q-attention. Given a voxelised scene, coarse-to-fine Q-attention learns what part of the scene to ‘zoom’ into. When this ‘zooming’ behaviour is applied iteratively, it results in a near-lossless discretisation of the translation space, and allows the use of a discrete action, deep Q-learning method. We show that our new coarse-to-fine algorithm achieves state-of-the-art performance on several difficult sparsely rewarded RLBench vision-based robotics tasks, and can train real-world policies, tabula rasa, in a matter of minutes, with as little as 3 demonstrations.

Conference paper

Ortiz J, Evans T, Sucar E, Davison AJet al., 2022, Incremental abstraction in distributed probabilistic SLAM graphs, 2022 IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE

Scene graphs represent the key components of a scene in a compact and semantically rich way, but are difficult to build during incremental SLAM operation because of the challenges of robustly identifying abstract scene elements and optimising continually changing, complex graphs. We present a distributed, graph-based SLAM framework for incrementally building scene graphs based on two novel components. First, we propose an incremental abstraction framework in which a neural network proposes abstract scene elements that are incorporated into the factor graph of a feature-based monocular SLAM system. Scene elements are confirmed or rejected through optimisation and incrementally replace the points yielding a more dense, semantic and compact representation. Second, enabled by our novel routing procedure, we use Gaussian Belief Propagation (GBP) for distributed inference on a graph processor. The time per iteration of GBP is structure-agnostic and we demonstrate the speed advantages over direct methods for inference of heterogeneous factor graphs. We run our system on real indoor datasets using planar abstractions and recover the major planes with significant compression.

Conference paper

Wada K, James S, Davison AJ, 2022, SafePicking: learning safe object extraction via object-level mapping, 2022 IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE

Robots need object-level scene understanding to manipulate objects while reasoning about contact, support, and occlusion among objects. Given a pile of objects, object recognition and reconstruction can identify the boundary of object instances, giving important cues as to how the objects form and support the pile. In this work, we present a system, SafePicking, that integrates object-level mapping and learning-based motion planning to generate a motion that safely extracts occluded target objects from a pile. Planning is done by learning a deep Q-network that receives observations of predicted poses and a depth-based heightmap to output a motion trajectory, trained to maximize a safety metric reward. Our results show that the observation fusion of poses and depth-sensing gives both better performance and robustness to the model. We evaluate our methods using the YCB objects in both simulation and the real world, achieving safe object extraction from piles.

Conference paper

Wada K, James S, Davison AJ, 2022, ReorientBot: learning object reorientation for specific-posed placement, 2022 IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE

Robots need the capability of placing objects in arbitrary, specific poses to rearrange the world and achieve various valuable tasks. Object reorientation plays a crucial role in this as objects may not initially be oriented such that the robot can grasp and then immediately place them in a specific goal pose. In this work, we present a vision-based manipulation system, ReorientBot, which consists of 1) visual scene understanding with pose estimation and volumetric reconstruction using an onboard RGB-D camera; 2) learned waypoint selection for successful and efficient motion generation for reorientation; 3) traditional motion planning to generate a collision-free trajectory from the selected waypoints. We evaluate our method using the YCB objects in both simulation and the real world, achieving 93% overall success, 81% improvement in success rate, and 22% improvement in execution time compared to a heuristic approach. We demonstrate extended multi-object rearrangement showing the general capability of the system.

Conference paper

James S, Davison AJ, 2022, Q-Attention: Enabling Efficient Learning for Vision-Based Robotic Manipulation, IEEE ROBOTICS AND AUTOMATION LETTERS, Vol: 7, Pages: 1612-1619, ISSN: 2377-3766

Author Web Link
Cite
Citations: 8

Journal article

Scona R, Matsuki H, Davison A, 2022, From scene flow to visual odometry through local and global regularisation in markov random fields, IEEE Robotics and Automation Letters, Vol: 7, Pages: 4299-4306, ISSN: 2377-3766

We revisit pairwise Markov Random Field (MRF) formulations for RGB-D scene flow and leverage novel advances in processor design for real-time implementations. We consider scene flow approaches which consist of data terms enforcing intensity consistency between consecutive images, together with regularisation terms which impose smoothness over the flow field. To achieve real-time operation, previous systems leveraged GPUs and implemented regularisation only between variables corresponding to neighbouring pixels. Such systems could estimate continuously deforming flow fields but the lack of global regularisation over the whole field made them ineffective for visual odometry. We leverage the GraphCore Intelligence Processing Unit (IPU) graph processor chip, which consists of 1216 independent cores called tiles, each with 256 kB local memory. The tiles are connected to an ultrafast all-to-all communication fabric which enables efficient data transmission between the tiles in an arbitrary communication pattern. We propose a distributed formulation for dense RGB-D scene flow based on Gaussian Belief Propagation which leverages the architecture of this processor to implement both local and global regularisation. Local regularisation is enforced for pairs of flow estimates whose corresponding pixels are neighbours, while global regularisation is defined for flow estimate pairs whose corresponding pixels are far from each other on the image plane. Using both types of regularisation allows our algorithm to handle a variety of in-scene motion and makes it suitable for estimating deforming scene flow, piece-wise rigid scene flow and visual odometry within the same system.

Journal article

Zhi S, Laidlow T, Leutenegger S, Davison AJet al., 2022, In-place scene labelling and understanding with implicit scene representation, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Publisher: IEEE

Semantic labelling is highly correlated with geometry and radiance reconstruction, as scene entities with similar shape and appearance are more likely to come from similar classes. Recent implicit neural reconstruction techniques are appealing as they do not require prior training data, but the same fully self-supervised approach is not possible for semantics because labels are human-defined properties.We extend neural radiance fields (NeRF) to jointly encode semantics with appearance and geometry, so that complete and accurate 2D semantic labels can be achieved using a small amount of in-place annotations specific to the scene. The intrinsic multi-view consistency and smoothness of NeRF benefit semantics by enabling sparse labels to efficiently propagate. We show the benefit of this approach when labels are either sparse or very noisy in room-scale scenes. We demonstrate its advantageous properties in various interesting applications such as an efficient scene labelling tool, novel semantic view synthesis, label denoising, super-resolution, label interpolation and multi-view semantic label fusion in visual semantic mapping systems.

Conference paper

Landgraf Z, Scona R, Laidlow T, James S, Leutenegger S, Davison AJet al., 2022, SIMstack: a generative shape and instance model for unordered object stacks, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Publisher: IEEE

By estimating 3D shape and instances from a single view, we can capture information about an environment quickly, without the need for comprehensive scanning and multi-view fusion. Solving this task for composite scenes (such as object stacks) is challenging: occluded areas are not only ambiguous in shape but also in instance segmentation; multiple decompositions could be valid. We observe that physics constrains decomposition as well as shape in occluded regions and hypothesise that a latent space learned from scenes built under physics simulation can serve as a prior to better predict shape and instances in occluded regions. To this end we propose SIMstack, a depth-conditioned Variational Auto-Encoder (VAE), trained on a dataset of objects stacked under physics simulation. We formulate instance segmentation as a centre voting task which allows for class-agnostic detection and doesn’t require setting the maximum number of objects in the scene. At test time, our model can generate 3D shape and instance segmentation from a single depth view, probabilistically sampling proposals for the occluded region from the learned latent space. Our method has practical applications in providing robots some of the ability humans have to make rapid intuitive inferences of partially observed scenes. We demonstrate an application for precise (non-disruptive) object grasping of unknown objects from a single depth view.

Conference paper

Sucar E, Liu S, Ortiz J, Davison AJet al., 2022, iMAP: implicit mapping and positioning in real-time, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 6209-6218

We show for the first time that a multilayer perceptron (MLP) can serve as the only scene representation in a real-time SLAM system for a handheld RGB-D camera. Our network is trained in live operation without prior data, building a dense, scene-specific implicit 3D model of occupancy and colour which is also immediately used for tracking.Achieving real-time SLAM via continual training of a neural network against a live image stream requires significant innovation. Our iMAP algorithm uses a keyframe structure and multi-processing computation flow, with dynamic information-guided pixel sampling for speed, with tracking at 10 Hz and global map updating at 2 Hz. The advantages of an implicit MLP over standard dense SLAM techniques include efficient geometry representation with automatic detail control and smooth, plausible filling-in of unobserved regions such as the back surfaces of objects.

Conference paper

Gallego G, Delbruck T, Orchard GM, Bartolozzi C, Taba B, Censi A, Leutenegger S, Davison A, Conradt J, Daniilidis K, Scaramuzza Det al., 2022, Event-Based Vision: A Survey, Publisher: IEEE COMPUTER SOC

Author Web Link
Cite
Citations: 538

Working paper

Liu S, Zhi S, Johns E, Davison AJet al., 2022, BOOTSTRAPPING SEMANTIC SEGMENTATION WITH REGIONAL CONTRAST

We present ReCo, a contrastive learning framework designed at a regional level to assist learning in semantic segmentation. ReCo performs pixel-level contrastive learning on a sparse set of hard negative pixels, with minimal additional memory footprint. ReCo is easy to implement, being built on top of off-the-shelf segmentation networks, and consistently improves performance, achieving more accurate segmentation boundaries and faster convergence. The strongest effect is in semi-supervised learning with very few labels. With ReCo, we achieve high quality semantic segmentation model, requiring only 5 examples of each semantic class.

Abstract
Cite
Citations: 29

Conference paper

Laidlow T, Davison AJ, 2022, Simultaneous Localisation and Mapping With Quadric Surfaces, International Conference on 3D Vision (3DV), Publisher: IEEE, Pages: 252-260, ISSN: 2378-3826

Conference paper

Matsuki H, Scona R, Czarnowski J, Davison AJet al., 2021, CodeMapping: real-time dense mapping for sparse SLAM using compact scene representations, IEEE Robotics and Automation Letters, Vol: 6, Pages: 7105-7112, ISSN: 2377-3766

We propose a novel dense mapping framework for sparse visual SLAM systems which leverages a compact scene representation. State-of-the-art sparse visual SLAM systems provide accurate and reliable estimates of the camera trajectory and locations of landmarks. While these sparse maps are useful for localization, they cannot be used for other tasks such as obstacle avoidance or scene understanding. In this letter we propose a dense mapping framework to complement sparse visual SLAM systems which takes as input the camera poses, keyframes and sparse points produced by the SLAM system and predicts a dense depth image for every keyframe. We build on CodeSLAM [1] and use a variational autoencoder (VAE) which is conditioned on intensity, sparse depth and reprojection error images from sparse SLAM to predict an uncertainty-aware dense depth map. The use of a VAE then enables us to refine the dense depth images through multi-view optimization which improves the consistency of overlapping frames. Our mapper runs in a separate thread in parallel to the SLAM system in a loosely coupled manner. This flexible design allows for integration with arbitrary metric sparse SLAM systems without delaying the main SLAM process. Our dense mapper can be used not only for local mapping but also globally consistent dense 3D reconstruction through TSDF fusion. We demonstrate our system running with ORB-SLAM3 and show accurate dense depth estimation which could enable applications such as robotics and augmented reality.

Journal article

Lenton D, James S, Clark R, Davison AJet al., 2021, END-TO-END EGOSPHERIC SPATIAL MEMORY

Spatial memory, or the ability to remember and recall specific locations and objects, is central to autonomous agents' ability to carry out tasks in real environments. However, most existing artificial memory modules are not very adept at storing spatial information. We propose a parameter-free module, Egospheric Spatial Memory (ESM), which encodes the memory in an ego-sphere around the agent, enabling expressive 3D representations. ESM can be trained end-to-end via either imitation or reinforcement learning, and improves both training efficiency and final performance against other memory baselines on both drone and manipulator visuomotor control tasks. The explicit egocentric geometry also enables us to seamlessly combine the learned controller with other non-learned modalities, such as local obstacle avoidance. We further show applications to semantic segmentation on the ScanNet dataset, where ESM naturally combines image-level and map-level inference modalities. Through our broad set of experiments, we show that ESM provides a general computation graph for embodied spatial reasoning, and the module forms a bridge between real-time mapping systems and differentiable memory architectures. Implementation at: https://github.com/ivy-dl/memory.

Abstract
Cite
Citations: 2

Conference paper

Xu B, Davison AJ, Leutenegger S, 2020, Deep probabilistic feature-metric tracking, Publisher: arXiv

Dense image alignment from RGB-D images remains a critical issue forreal-world applications, especially under challenging lighting conditions andin a wide baseline setting. In this paper, we propose a new framework to learna pixel-wise deep feature map and a deep feature-metric uncertainty mappredicted by a Convolutional Neural Network (CNN), which together formulate adeep probabilistic feature-metric residual of the two-view constraint that canbe minimised using Gauss-Newton in a coarse-to-fine optimisation framework.Furthermore, our network predicts a deep initial pose for faster and morereliable convergence. The optimisation steps are differentiable and unrolled totrain in an end-to-end fashion. Due to its probabilistic essence, our approachcan easily couple with other residuals, where we show a combination with ICP.Experimental results demonstrate state-of-the-art performance on the TUM RGB-Ddataset and 3D rigid object tracking dataset. We further demonstrate ourmethod's robustness and convergence qualitatively.

Working paper

Wada K, Sucar E, James S, Lenton D, Davison AJet al., 2020, MoreFusion: multi-object reasoning for 6D pose estimation from volumetric fusion, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE

Robots and other smart devices need efficient object-based scene representations from their on-board vision systems to reason about contact, physics and occlusion. Recognized precise object models will play an important role alongside non-parametric reconstructions of unrecognized structures. We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision. Our approach makes 3D object pose proposals from single RGB-D views, accumulates pose estimates and non-parametric occupancy information from multiple views as the camera moves, and performs joint optimization to estimate consistent, non-intersecting poses for multiple objects in contact. We verify the accuracy and robustness of our approach experimentally on 2 object datasets: YCB-Video, and our own challenging Cluttered YCB-Video. We demonstrate a real-time robotics application where a robot arm precisely and orderly disassembles complicated piles of objects, using only on-board RGB-D vision.

Conference paper

Bonardi A, James S, Davison AJ, 2020, Learning one-shot imitation from humans without humans, IEEE Robotics and Automation Letters, Vol: 5, Pages: 3533-3539, ISSN: 2377-3766

Humans can naturally learn to execute a new task by seeing it performed by other individuals once, and then reproduce it in a variety of configurations. Endowing robots with this ability of imitating humans from third person is a very immediate and natural way of teaching new tasks. Only recently, through meta-learning, there have been successful attempts to one-shot imitation learning from humans; however, these approaches require a lot of human resources to collect the data in the real world to train the robot. But is there a way to remove the need for real world human demonstrations during training? We show that with Task-Embedded Control Networks, we can infer control polices by embedding human demonstrations that can condition a control policy and achieve one-shot imitation learning. Importantly, we do not use a real human arm to supply demonstrations during training, but instead leverage domain randomisation in an application that has not been seen before: sim-to-real transfer on humans. Upon evaluating our approach on pushing and placing tasks in both simulation and in the real world, we show that in comparison to a system that was trained on real-world data we are able to achieve similar results by utilising only simulation data. Videos can be found here: https://sites.google.com/view/tecnets-humans .

Journal article

James S, Ma Z, Arrojo DR, Davison AJet al., 2020, RLBench: The robot learning benchmark & learning environment, IEEE Robotics and Automation Letters, Vol: 5, Pages: 3019-3026, ISSN: 2377-3766

We present a challenging new benchmark and learning-environment for robot learning: RLBench. The benchmark features 100 completely unique, hand-designed tasks, ranging in difficulty from simple target reaching and door opening to longer multi-stage tasks, such as opening an oven and placing a tray in it. We provide an array of both proprioceptive observations and visual observations, which include rgb, depth, and segmentation masks from an over-the-shoulder stereo camera and an eye-in-hand monocular camera. Uniquely, each task comes with an infinite supply of demos through the use of motion planners operating on a series of waypoints given during task creation time; enabling an exciting flurry of demonstration-based learning possibilities. RLBench has been designed with scalability in mind; new tasks, along with their motion-planned demos, can be easily created and then verified by a series of tools, allowing users to submit their own tasks to the RLBench task repository. This large-scale benchmark aims to accelerate progress in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning. With the benchmark's breadth of tasks and demonstrations, we propose the first large-scale few-shot challenge in robotics. We hope that the scale and diversity of RLBench offers unparalleled research opportunities in the robot learning community and beyond. Benchmarking code and videos can be found at https://sites.google.com/view/rlbench .

Journal article

Ortiz J, Pupilli M, Leutenegger S, Davison AJet al., 2020, Bundle adjustment on a graph processor, Publisher: arXiv

Graph processors such as Graphcore's Intelligence Processing Unit (IPU) arepart of the major new wave of novel computer architecture for AI, and have ageneral design with massively parallel computation, distributed on-chip memoryand very high inter-core communication bandwidth which allows breakthroughperformance for message passing algorithms on arbitrary graphs. We show for thefirst time that the classical computer vision problem of bundle adjustment (BA)can be solved extremely fast on a graph processor using Gaussian BeliefPropagation. Our simple but fully parallel implementation uses the 1216 coreson a single IPU chip to, for instance, solve a real BA problem with 125keyframes and 1919 points in under 40ms, compared to 1450ms for the Ceres CPUlibrary. Further code optimisation will surely increase this difference onstatic problems, but we argue that the real promise of graph processing is forflexible in-place optimisation of general, dynamically changing factor graphsrepresenting Spatial AI problems. We give indications of this with experimentsshowing the ability of GBP to efficiently solve incremental SLAM problems, anddeal with robust cost functions and different types of factors.

Working paper

Bloesch M, Laidlow T, Clark R, Leutenegger S, Davison Aet al., 2020, Learning meshes for dense visual SLAM, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Publisher: IEEE

Estimating motion and surrounding geometry of a moving camera remains a challenging inference problem. From an information theoretic point of view, estimates should get better as more information is included, such as is done in dense SLAM, but this is strongly dependent on the validity of the underlying models. In the present paper, we use triangular meshes as both compact and dense geometry representation. To allow for simple and fast usage, we propose a view-based formulation for which we predict the in-plane vertex coordinates directly from images and then employ the remaining vertex depth components as free variables. Flexible and continuous integration of information is achieved through the use of a residual based inference technique. This so-called factor graph encodes all information as mapping from free variables to residuals, the squared sum of which is minimised during inference. We propose the use of different types of learnable residuals, which are trained end-to-end to increase their suitability as information bearing models and to enable accurate and reliable estimation. Detailed evaluation of all components is provided on both synthetic and real data which confirms the practicability of the presented approach.

Conference paper

ProfessorAndrewDavison

Contact

Assistant

Location

Summary