39 results found
Kapelyukh I, Vosylius V, Johns E, 2023, DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics, IEEE ROBOTICS AND AUTOMATION LETTERS, Vol: 8, Pages: 3956-3963, ISSN: 2377-3766
Valassakis E, Papagiannis G, Di Palo N, et al., 2022, Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 8614-8621, ISSN: 2153-0858
Johns E, 2021, Back to reality for imitation learning, Conference on Robot Learning (CoRL) 2021, Publisher: OpenReview, Pages: 1-5
Imitation learning, and robot learning in general, emerged due to breakthroughs in machine learning, rather than breakthroughs in robotics. As such, evaluation metrics for robot learning are deeply rooted in those for machine learning, and focus primarily on data efficiency. We believe that a better metric for real-world robot learning is time efficiency, which better models the true cost to humans. This is a call to arms to the robot learning community to develop our own evaluation metrics, tailored towards the long-term goals of real-world robotics.
Johns E, Di Palo N, 2021, Learning multi-stage tasks with one demonstration via self-replay, Conference on Robot Learning (CoRL) 2021, Publisher: OpenReview, Pages: 1-10
In this work, we introduce a novel method to learn everyday-like multistage tasks from a single human demonstration, without requiring any prior objectknowledge. Inspired by the recent Coarse-to-Fine Imitation Learning method, wemodel imitation learning as a learned object reaching phase followed by an openloop replay of the demonstrator’s actions. We build upon this for multi-stage taskswhere, following the human demonstration, the robot can autonomously collectimage data for the entire multi-stage task, by reaching the next object in the sequence and then replaying the demonstration, and then repeating in a loop for allstages of the task. We evaluate with real-world experiments on a set of everydaylike multi-stage tasks, which we show that our method can solve from a singledemonstration. Videos and supplementary material can be found at this webpage.
Alghonaim R, Johns E, 2021, Benchmarking domain randomisation for visual sim-to-real transfer, 2021 International Conference on Robotics and Automation (ICRA), Publisher: Institute of Electrical and Electronics Engineers, Pages: 12802-12808, ISSN: 1050-4729
Domain randomisation is a very popular methodfor visual sim-to-real transfer in robotics, due to its simplicityand ability to achieve transfer without any real-world imagesat all. Nonetheless, a number of design choices must be madeto achieve optimal transfer. In this paper, we perform acomprehensive benchmarking study on these different choices,with two key experiments evaluated on a real-world object poseestimation task. First, we study the rendering quality, and findthat a small number of high-quality images is superior to alarge number of low-quality images. Second, we study the typeof randomisation, and find that both distractors and texturesare important for generalisation to novel environment
Johns E, Kapelyukh I, 2021, My House, My Rules: Learning Tidying Preferences with Graph Neural Networks, Conference on Robot Learning (CoRL) 2021
Du L, Ye X, Tan X, et al., 2021, AGO-Net: Association-Guided 3D Point Cloud Object Detection Network, IEEE Transactions on Pattern Analysis and Machine Intelligence, Pages: 1-1, ISSN: 0162-8828
Dreczkowski K, Johns E, 2021, Hybrid ICP, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)
Tsai Y-Y, Xu H, Ding Z, et al., 2021, DROID: minimizing the reality gap using single-shot human demonstration, IEEE Robotics and Automation Letters, Vol: 6, Pages: 3168-3175, ISSN: 2377-3766
Reinforcement learning (RL) has demonstrated great success in the past several years. However, most of the scenarios focus on simulated environments. One of the main challenges of transferring the policy learned in a simulated environment to real world, is the discrepancy between the dynamics of the two environments. In prior works, Domain Randomization (DR) has been used to address the reality gap for both robotic locomotion and manipulation tasks. In this letter, we propose Domain Randomization Optimization IDentification (DROID), a novel framework to exploit single-shot human demonstration for identifying the simulator's distribution of dynamics parameters, and apply it to training a policy on a door opening task. Our results show that the proposed framework can identify the difference in dynamics between the simulated and the real worlds, and thus improve policy transfer by optimizing the simulator's randomization ranges. We further illustrate that based on these same identified parameters, our method can generalize the learned policy to different but related tasks.
Johns E, 2021, Coarse-to-fine imitation learning: robot manipulation from a single demonstration, 2021 International Conference on Robotics and Automation (ICRA), Publisher: Institute of Electrical and Electronics Engineers, ISSN: 1050-4729
We introduce a simple new method for visual imitation learning, which allows a novel robot manipulation task to be learned from a single human demonstration, without requiring any prior knowledge of the object being interacted with. Our method models imitation learning as a state estimation problem, with the state defined as the end-effector's pose at the point where object interaction begins, as observed from the demonstration. By then modelling a manipulation task as a coarse, approach trajectory followed by a fine, interaction trajectory, this state estimator can be trained in a self-supervised manner, by automatically moving the end-effector's camera around the object. At test time, the end-effector moves to the estimated state through a linear path, at which point the original demonstration's end-effector velocities are simply replayed. This enables convenient acquisition of a complex interaction trajectory, without actually needing to explicitly learn a policy. Real-world experiments on 8 everyday tasks show that our method can learn a diverse range of skills from a single human demonstration, whilst also yielding a stable and interpretable controller.
Johns E, Garcia-Hernando G, Kim T-K, 2021, Physics-based dexterous manipulations with estimated hand poses and residual reinforcement learning, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE, Pages: 9561-9568
Dexterous manipulation of objects in virtual environments with our bare hands, by using only a depth sensor and a state-of-the-art 3D hand pose estimator (HPE), is challenging. While virtual environments are ruled by physics, e.g. object weights and surface frictions, the absence of force feedback makes the task challenging, as even slight inaccuracies on finger tips or contact points from HPE may make the interactions fail. Prior arts simply generate contact forces in the direction of the fingers' closures, when finger joints penetrate virtual objects. Although useful for simple grasping scenarios, they cannot be applied to dexterous manipulations such as inhand manipulation. Existing reinforcement learning (RL) and imitation learning (IL) approaches train agents that learn skills by using task-specific rewards, without considering any online user input. In this work, we propose to learn a model that maps noisy input hand poses to target virtual poses, which introduces the needed contacts to accomplish the tasks on a physics simulator. The agent is trained in a residual setting by using a model-free hybrid RL+IL approach. A 3D hand pose estimation reward is introduced leading to an improvement on HPE accuracy when the physics-guided corrected target poses are remapped to the input space. As the model corrects HPE errors by applying minor but crucial joint displacements for contacts, this helps to keep the generated motion visually close to the user input. Since HPE sequences performing successful virtual interactions do not exist, a data generation scheme to train and evaluate the system is proposed. We test our framework in two applications that use hand pose estimates for dexterous manipulations: hand-object interactions in VR and hand-object motion reconstruction in-the-wild. Experiments show that the proposed method outperforms various RL/IL baselines and the simple prior art of enforcing hand closure, both in task success and hand pose accuracy.
Valassakis P, Ding Z, Johns E, 2021, Crossing the gap: a deep dive into zero-shot sim-to-real transfer for dynamics, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE
Zero-shot sim-to-real transfer of tasks with complex dynamics is a highly challenging and unsolved problem. A number of solutions have been proposed in recent years, but we have found that many works do not present a thorough evaluation in the real world, or underplay the significant engineering effort and task-specific fine tuning that is required to achieve the published results. In this paper, we dive deeper into the sim-to-real transfer challenge, investigate why this issuch a difficult problem, and present objective evaluations of anumber of transfer methods across a range of real-world tasks.Surprisingly, we found that a method which simply injects random forces into the simulation performs just as well as more complex methods, such as those which randomise the simulator's dynamics parameters
Valassakis E, Di Palo N, Johns E, 2021, Coarse-to-Fine for Sim-to-Real: Sub-Millimetre Precision Across Wide Task Spaces, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 5989-5996, ISSN: 2153-0858
We present a novel resizing module for neural networks: shape adaptor, a drop-in enhancement built on top of traditional resizing layers, such as pooling, bilinear sampling, and strided convolution. Whilst traditional resizing layers have fixed and deterministic reshaping factors, our module allows for a learnable reshaping factor. Our implementation enables shape adaptors to be trained end-to-end without any additional supervision, through which network architectures can be optimised for each individual task, in a fully automated way. We performed experiments across seven image classification datasets, and results show that by simply using a set of our shape adaptors instead of the original resizing layers, performance increases consistently over human-designed networks, across all datasets. Additionally, we show the effectiveness of shape adaptors on two other applications: network compression and transfer learning.
Ding Z, Lepora N, Johns E, 2020, Sim-to-real transfer for optical tactile sensing, IEEE International Conference on Robotics and Automation, Publisher: IEEE, Pages: 1639-1645, ISSN: 2152-4092
Deep learning and reinforcement learning meth-ods have been shown to enable learning of flexible and complexrobot controllers. However, the reliance on large amounts oftraining data often requires data collection to be carried outin simulation, with a number of sim-to-real transfer methodsbeing developed in recent years. In this paper, we study thesetechniques for tactile sensing using the TacTip optical tactilesensor, which consists of a deformable tip with a cameraobserving the positions of pins inside this tip. We designeda model for soft body simulation which was implemented usingthe Unity physics engine, and trained a neural network topredict the locations and angles of edges when in contact withthe sensor. Using domain randomisation techniques for sim-to-real transfer, we show how this framework can be used toaccurately predict edges with less than 1 mm prediction errorin real-world testing, without any real-world data at all.
Tsai Y-Y, Xiao B, Johns E, et al., 2020, Constrained-space optimization and reinforcement learning for complex tasks, IEEE Robotics and Automation Letters, Vol: 5, Pages: 683-690, ISSN: 2377-3766
Learning from demonstration is increasingly used for transferring operator manipulation skills to robots. In practice, it is important to cater for limited data and imperfect human demonstrations, as well as underlying safety constraints. This article presents a constrained-space optimization and reinforcement learning scheme for managing complex tasks. Through interactions within the constrained space, the reinforcement learning agent is trained to optimize the manipulation skills according to a defined reward function. After learning, the optimal policy is derived from the well-trained reinforcement learning agent, which is then implemented to guide the robot to conduct tasks that are similar to the experts' demonstrations. The effectiveness of the proposed method is verified with a robotic suturing task, demonstrating that the learned policy outperformed the experts' demonstrations in terms of the smoothness of the joint motion and end-effector trajectories, as well as the overall task completion time.
Johns E, Liu S, Davison A, 2020, End-to-end multi-task learning with attention, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, Publisher: IEEE
We propose a novel multi-task learning architecture, which allows learning of task-specific feature-level attention. Our design, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with a soft-attention module for each task. These modules allow for learning of task-specific features from the global features, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be trained end-to-end and can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. We evaluate our approach on a variety of datasets, across both image-to-image predictions and image classification tasks. We show that our architecture is state-of-the-art in multi-task learning compared to existing methods, and is also less sensitive to various weighting schemes in the multi-task loss function. Code is available at https://github.com/lorenmt/mtan.
Liu S, Davison A, Johns E, 2019, Self-supervised generalisation with meta auxiliary learning, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Publisher: Neural Information Processing Systems Foundation, Inc.
Learning with auxiliary tasks can improve the ability of a primary task to generalise.However, this comes at the cost of manually labelling auxiliary data. We propose anew method which automatically learns appropriate labels for an auxiliary task,such that any supervised learning task can be improved without requiring access toany further data. The approach is to train two neural networks: a label-generationnetwork to predict the auxiliary labels, and a multi-task network to train theprimary task alongside the auxiliary task. The loss for the label-generation networkincorporates the loss of the multi-task network, and so this interaction between thetwo networks can be seen as a form of meta learning with a double gradient. Weshow that our proposed method, Meta AuXiliary Learning (MAXL), outperformssingle-task learning on 7 image datasets, without requiring any additional data.We also show that MAXL outperforms several other baselines for generatingauxiliary labels, and is even competitive when compared with human-definedauxiliary labels. The self-supervised nature of our method leads to a promisingnew direction towards automated generalisation. Source code can be found athttps://github.com/lorenmt/maxl.
James S, Davison A, Johns E, 2017, Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task, Conference on Robot Learning, Publisher: PMLR, Pages: 334-343
End-to-end control for robot manipulation and grasping is emergingas an attractive alternative to traditional pipelined approaches. However, end-to-end methods tend to either be slow to train, exhibit little or no generalisability,or lack the ability to accomplish long-horizon or multi-stage tasks. In this paper,we show how two simple techniques can lead to end-to-end (image to velocity)execution of a multi-stage task, which is analogous to a simple tidying routine,without having seen a single real image. This involves locating, reaching for, andgrasping a cube, then locating a basket and dropping the cube inside. To achievethis, robot trajectories are computed in a simulator, to collect a series of controlvelocities which accomplish the task. Then, a CNN is trained to map observedimages to velocities, using domain randomisation to enable generalisation to realworld images. Results show that we are able to successfully accomplish the taskin the real world with the ability to generalise to novel environments, includingthose with dynamic lighting conditions, distractor objects, and moving objects,including the basket itself. We believe our approach to be simple, highly scalable,and capable of learning long-horizon tasks that have until now not been shownwith the state-of-the-art in end-to-end robot control.
James S, Davison A, Johns E, 2017, Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task, Conference on Robot Learning
Saeedi Gharahbolagh S, Nardi L, Johns E, et al., 2017, Application-oriented design space exploration for SLAM algorithms, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE
In visual SLAM, there are many software and hardware parameters, such as algorithmic thresholds and GPU frequency, that need to be tuned; however, this tuning should also take into account the structure and motion of the camera. In this paper, we determine the complexity of the structure and motion with a few parameters calculated using information theory. Depending on this complexity and the desired performance metrics, suitable parameters are explored and determined. Additionally, based on the proposed structure and motion parameters, several applications are presented, including a novel active SLAM approach which guides the camera in such a way that the SLAM algorithm achieves the desired performance metrics. Real-world and simulated experimental results demonstrate the effectiveness of the proposed design space and its applications.
Ye M, Johns E, Walter B, et al., 2017, An image retrieval framework for real-time endoscopic image retargeting, International Journal of Computer Assisted Radiology and Surgery, Vol: 12, Pages: 1281-1292, ISSN: 1861-6429
PurposeSerial endoscopic examinations of a patient are important for early diagnosis of malignancies in the gastrointestinal tract. However, retargeting for optical biopsy is challenging due to extensive tissue variations between examinations, requiring the method to be tolerant to these changes whilst enabling real-time retargeting.MethodThis work presents an image retrieval framework for inter-examination retargeting. We propose both a novel image descriptor tolerant of long-term tissue changes and a novel descriptor matching method in real time. The descriptor is based on histograms generated from regional intensity comparisons over multiple scales, offering stability over long-term appearance changes at the higher levels, whilst remaining discriminative at the lower levels. The matching method then learns a hashing function using random forests, to compress the string and allow for fast image comparison by a simple Hamming distance metric.ResultsA dataset that contains 13 in vivo gastrointestinal videos was collected from six patients, representing serial examinations of each patient, which includes videos captured with significant time intervals. Precision-recall for retargeting shows that our new descriptor outperforms a number of alternative descriptors, whilst our hashing method outperforms a number of alternative hashing approaches.ConclusionWe have proposed a novel framework for optical biopsy in serial endoscopic examinations. A new descriptor, combined with a novel hashing method, achieves state-of-the-art retargeting, with validation on in vivo videos from six patients. Real-time performance also allows for practical integration without disturbing the existing clinical workflow.
Johns E, Leutenegger S, Davison AJ, 2016, Pairwise Decomposition of Image Sequences for Active Multi-View Recognition, Computer Vision and Pattern Recognition, Publisher: Computer Vision Foundation (CVF), ISSN: 1063-6919
A multi-view image sequence provides a much richercapacity for object recognition than from a single image.However, most existing solutions to multi-view recognitiontypically adopt hand-crafted, model-based geometric methods,which do not readily embrace recent trends in deeplearning. We propose to bring Convolutional Neural Networksto generic multi-view recognition, by decomposingan image sequence into a set of image pairs, classifyingeach pair independently, and then learning an object classi-fier by weighting the contribution of each pair. This allowsfor recognition over arbitrary camera trajectories, withoutrequiring explicit training over the potentially infinite numberof camera paths and lengths. Building these pairwiserelationships then naturally extends to the next-best-viewproblem in an active recognition framework. To achievethis, we train a second Convolutional Neural Network tomap directly from an observed image to next viewpoint.Finally, we incorporate this into a trajectory optimisationtask, whereby the best recognition confidence is sought fora given trajectory length. We present state-of-the-art resultsin both guided and unguided multi-view recognition on theModelNet dataset, and show how our method can be usedwith depth images, greyscale images, or both.
Johns E, Leutenegger S, Davison AJ, 2016, Deep learning a grasp function for grasping under gripper pose uncertainty, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE, Pages: 4461-4468, ISSN: 2153-0866
This paper presents a new method for paralleljawgrasping of isolated objects from depth images, underlarge gripper pose uncertainty. Whilst most approaches aimto predict the single best grasp pose from an image, ourmethod first predicts a score for every possible grasp pose,which we denote the grasp function. With this, it is possibleto achieve grasping robust to the gripper’s pose uncertainty,by smoothing the grasp function with the pose uncertaintyfunction. Therefore, if the single best pose is adjacent to aregion of poor grasp quality, that pose will no longer be chosen,and instead a pose will be chosen which is surrounded by aregion of high grasp quality. To learn this function, we traina Convolutional Neural Network which takes as input a singledepth image of an object, and outputs a score for each grasppose across the image. Training data for this is generated byuse of physics simulation and depth image simulation with 3Dobject meshes, to enable acquisition of sufficient data withoutrequiring exhaustive real-world experiments. We evaluate withboth synthetic and real experiments, and show that the learnedgrasp score is more robust to gripper pose uncertainty thanwhen this uncertainty is not accounted for.
Ye M, Johns E, Walter B, et al., 2016, Robust Image Descriptors for Real-Time Inter-Examination Retargeting in Gastrointestinal Endoscopy, International Conference on Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016, Publisher: Springer, Pages: 448-456, ISSN: 0302-9743
For early diagnosis of malignancies in the gastrointestinaltract, surveillance endoscopy is increasingly used to monitor abnormaltissue changes in serial examinations of the same patient. Despite suc-cesses with optical biopsy forin vivoandin situtissue characterisa-tion, biopsy retargeting for serial examinations is challenging becausetissue may change in appearance between examinations. In this paper, wepropose an inter-examination retargeting framework for optical biopsy,based on an image descriptor designed for matching between endoscopicscenes over significant time intervals. Each scene is described by a hierar-chy of regional intensity comparisons at various scales, offering toleranceto long-term change in tissue appearance whilst remaining discrimina-tive. Binary coding is then used to compress the descriptor via a novelrandom forests approach, providing fast comparisons in Hamming spaceand real-time retargeting. Extensive validation conducted on 13in vivogastrointestinal videos, collected from six patients, show that our ap-proach outperforms state-of-the-art methods.
Johns E, Mac Aodha O, Brostow G, 2015, Becoming the expert - interactive multi-class machine teaching, Conference on Computer Vision and Pattern Recognition 2015, Publisher: Institute of Electrical and Electronics Engineers, ISSN: 1063-6919
Compared to machines, humans are extremely good atclassifying images into categories, especially when theypossess prior knowledge of the categories at hand. If thisprior information is not available, supervision in the formof teaching images is required. To learn categories morequickly, people should see important and representative im-ages first, followed by less important images later – or not atall. However, image-importance is individual-specific, i.e.a teaching image is important to a student if it changes theiroverall ability to discriminate between classes. Further, stu-dents keep learning, so while image-importance depends ontheir current knowledge, it also varies with time.In this work we propose an Interactive Machine Teach-ing algorithm that enables a computer to teach challeng-ing visual concepts to a human. Our adaptive algorithmchooses, online, which labeled images from a teaching setshould be shown to the student as they learn. We show that ateaching strategy that probabilistically models the student’sability and progress, based on their correct and incorrectanswers, produces better ‘experts’. We present results us-ing real human participants across several varied and chal-lenging real-world datasets.
Ye M, Johns E, Giannarou S, et al., 2014, Online Scene Association for Endoscopic Navigation, 17th International Conference MICCAI 2014, Publisher: Springer International Publishing, Pages: 316-323, ISSN: 0302-9743
Endoscopic surveillance is a widely used method for moni-toring abnormal changes in the gastrointestinal tract such as Barrett'sesophagus. Direct visual assessment, however, is both time consumingand error prone, as it involves manual labelling of abnormalities on alarge set of images. To assist surveillance, this paper proposes an onlinescene association scheme to summarise an endoscopic video into scenes,on-the-y. This provides scene clustering based on visual contents, andalso facilitates topological localisation during navigation. The proposedmethod is based on tracking and detection of visual landmarks on thetissue surface. A generative model is proposed for online learning of pair-wise geometrical relationships between landmarks. This enables robustdetection of landmarks and scene association under tissue deformation.Detailed experimental comparison and validation have been conductedon in vivo endoscopic videos to demonstrate the practical value of ourapproach.
Johns E, Yang G-Z, 2014, Generative Methods for Long-Term Place Recognition in Dynamic Scenes, INTERNATIONAL JOURNAL OF COMPUTER VISION, Vol: 106, Pages: 297-314, ISSN: 0920-5691
Ye M, Johns E, Giannarou S, et al., 2014, Online scene association for endoscopic navigation, Pages: 316-323
Endoscopic surveillance is a widely used method for monitoring abnormal changes in the gastrointestinal tract such as Barrett's esophagus. Direct visual assessment, however, is both time consuming and error prone, as it involves manual labelling of abnormalities on a large set of images. To assist surveillance, this paper proposes an online scene association scheme to summarise an endoscopic video into scenes, on-the-fly. This provides scene clustering based on visual contents, and also facilitates topological localisation during navigation. The proposed method is based on tracking and detection of visual landmarks on the tissue surface. A generative model is proposed for online learning of pairwise geometrical relationships between landmarks. This enables robust detection of landmarks and scene association under tissue deformation. Detailed experimental comparison and validation have been conducted on in vivo endoscopic videos to demonstrate the practical value of our approach.
Johns E, Yang G-Z, 2014, Pairwise Probabilistic Voting: Fast Place Recognition without RANSAC, 13th European Conference on Computer Vision (ECCV), Publisher: SPRINGER-VERLAG BERLIN, Pages: 504-519, ISSN: 0302-9743
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.