31 results found
Johns E, 2021, Coarse-to-fine imitation learning: robot manipulation from a single demonstration, 2021 International Conference on Robotics and Automation (ICRA), Publisher: Institute of Electrical and Electronics Engineers, ISSN: 1050-4729
We introduce a simple new method for visual imitation learning, which allows a novel robot manipulation task to be learned from a single human demonstration, without requiring any prior knowledge of the object being interacted with. Our method models imitation learning as a state estimation problem, with the state defined as the end-effector's pose at the point where object interaction begins, as observed from the demonstration. By then modelling a manipulation task as a coarse, approach trajectory followed by a fine, interaction trajectory, this state estimator can be trained in a self-supervised manner, by automatically moving the end-effector's camera around the object. At test time, the end-effector moves to the estimated state through a linear path, at which point the original demonstration's end-effector velocities are simply replayed. This enables convenient acquisition of a complex interaction trajectory, without actually needing to explicitly learn a policy. Real-world experiments on 8 everyday tasks show that our method can learn a diverse range of skills from a single human demonstration, whilst also yielding a stable and interpretable controller.
Alghonaim R, Johns E, 2021, Benchmarking Domain Randomisation for Visual Sim-to-Real Transfer, 2021 International Conference on Robotics and Automation (ICRA), Publisher: Institute of Electrical and Electronics Engineers, ISSN: 1050-4729
Tsai Y-Y, Xu H, Ding Z, et al., 2021, DROID: Minimizing the Reality Gap Using Single-Shot Human Demonstration, IEEE ROBOTICS AND AUTOMATION LETTERS, Vol: 6, Pages: 3168-3175, ISSN: 2377-3766
Du L, Ye X, Tan X, et al., 2021, AGO-Net: Association-Guided 3D Point Cloud Object Detection Network, IEEE Transactions on Pattern Analysis and Machine Intelligence, Pages: 1-1, ISSN: 0162-8828
We present a novel resizing module for neural networks: shape adaptor, a drop-in enhancement built on top of traditional resizing layers, such as pooling, bilinear sampling, and strided convolution. Whilst traditional resizing layers have fixed and deterministic reshaping factors, our module allows for a learnable reshaping factor. Our implementation enables shape adaptors to be trained end-to-end without any additional supervision, through which network architectures can be optimised for each individual task, in a fully automated way. We performed experiments across seven image classification datasets, and results show that by simply using a set of our shape adaptors instead of the original resizing layers, performance increases consistently over human-designed networks, across all datasets. Additionally, we show the effectiveness of shape adaptors on two other applications: network compression and transfer learning.
Ding Z, Lepora N, Johns E, 2020, Sim-to-real transfer for optical tactile sensing, IEEE International Conference on Robotics and Automation, Publisher: IEEE, Pages: 1-7, ISSN: 2152-4092
Deep learning and reinforcement learning meth-ods have been shown to enable learning of flexible and complexrobot controllers. However, the reliance on large amounts oftraining data often requires data collection to be carried outin simulation, with a number of sim-to-real transfer methodsbeing developed in recent years. In this paper, we study thesetechniques for tactile sensing using the TacTip optical tactilesensor, which consists of a deformable tip with a cameraobserving the positions of pins inside this tip. We designeda model for soft body simulation which was implemented usingthe Unity physics engine, and trained a neural network topredict the locations and angles of edges when in contact withthe sensor. Using domain randomisation techniques for sim-to-real transfer, we show how this framework can be used toaccurately predict edges with less than 1 mm prediction errorin real-world testing, without any real-world data at all.
Johns E, Garcia-Hernando G, Kim T-K, 2020, Physics-Based Dexterous Manipulations with Estimated Hand Poses and Residual Reinforcement Learning, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems
Valassakis P, Ding Z, Johns E, 2020, Crossing the gap: a deep dive into zero-shot sim-to-real transfer for dynamics, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE
Zero-shot sim-to-real transfer of tasks with complex dynamics is a highly challenging and unsolved problem. A number of solutions have been proposed in recent years, but we have found that many works do not present a thorough evaluation in the real world, or underplay the significant engineering effort and task-specific fine tuning that is required to achieve the published results. In this paper, we dive deeper into the sim-to-real transfer challenge, investigate why this issuch a difficult problem, and present objective evaluations of anumber of transfer methods across a range of real-world tasks.Surprisingly, we found that a method which simply injects random forces into the simulation performs just as well as more complex methods, such as those which randomise the simulator's dynamics parameters
Tsai Y-Y, Xiao B, Johns E, et al., 2020, Constrained-Space Optimization and Reinforcement Learning for Complex Tasks, IEEE ROBOTICS AND AUTOMATION LETTERS, Vol: 5, Pages: 683-690, ISSN: 2377-3766
Johns E, Liu S, Davison A, 2020, End-to-end multi-task learning with attention, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, Publisher: IEEE
We propose a novel multi-task learning architecture, which allows learning of task-specific feature-level attention. Our design, the Multi-Task Attention Network (MTAN), consists of a single shared network containing a global feature pool, together with a soft-attention module for each task. These modules allow for learning of task-specific features from the global features, whilst simultaneously allowing for features to be shared across different tasks. The architecture can be trained end-to-end and can be built upon any feed-forward neural network, is simple to implement, and is parameter efficient. We evaluate our approach on a variety of datasets, across both image-to-image predictions and image classification tasks. We show that our architecture is state-of-the-art in multi-task learning compared to existing methods, and is also less sensitive to various weighting schemes in the multi-task loss function. Code is available at https://github.com/lorenmt/mtan.
Liu S, Davison A, Johns E, 2019, Self-supervised generalisation with meta auxiliary learning, 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Publisher: Neural Information Processing Systems Foundation, Inc.
Learning with auxiliary tasks can improve the ability of a primary task to generalise.However, this comes at the cost of manually labelling auxiliary data. We propose anew method which automatically learns appropriate labels for an auxiliary task,such that any supervised learning task can be improved without requiring access toany further data. The approach is to train two neural networks: a label-generationnetwork to predict the auxiliary labels, and a multi-task network to train theprimary task alongside the auxiliary task. The loss for the label-generation networkincorporates the loss of the multi-task network, and so this interaction between thetwo networks can be seen as a form of meta learning with a double gradient. Weshow that our proposed method, Meta AuXiliary Learning (MAXL), outperformssingle-task learning on 7 image datasets, without requiring any additional data.We also show that MAXL outperforms several other baselines for generatingauxiliary labels, and is even competitive when compared with human-definedauxiliary labels. The self-supervised nature of our method leads to a promisingnew direction towards automated generalisation. Source code can be found athttps://github.com/lorenmt/maxl.
James S, Davison A, Johns E, 2017, Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task, Conference on Robot Learning, Publisher: PMLR, Pages: 334-343
End-to-end control for robot manipulation and grasping is emergingas an attractive alternative to traditional pipelined approaches. However, end-to-end methods tend to either be slow to train, exhibit little or no generalisability,or lack the ability to accomplish long-horizon or multi-stage tasks. In this paper,we show how two simple techniques can lead to end-to-end (image to velocity)execution of a multi-stage task, which is analogous to a simple tidying routine,without having seen a single real image. This involves locating, reaching for, andgrasping a cube, then locating a basket and dropping the cube inside. To achievethis, robot trajectories are computed in a simulator, to collect a series of controlvelocities which accomplish the task. Then, a CNN is trained to map observedimages to velocities, using domain randomisation to enable generalisation to realworld images. Results show that we are able to successfully accomplish the taskin the real world with the ability to generalise to novel environments, includingthose with dynamic lighting conditions, distractor objects, and moving objects,including the basket itself. We believe our approach to be simple, highly scalable,and capable of learning long-horizon tasks that have until now not been shownwith the state-of-the-art in end-to-end robot control.
Saeedi Gharahbolagh S, Nardi L, Johns E, et al., 2017, Application-oriented design space exploration for SLAM algorithms, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE
In visual SLAM, there are many software and hardware parameters, such as algorithmic thresholds and GPU frequency, that need to be tuned; however, this tuning should also take into account the structure and motion of the camera. In this paper, we determine the complexity of the structure and motion with a few parameters calculated using information theory. Depending on this complexity and the desired performance metrics, suitable parameters are explored and determined. Additionally, based on the proposed structure and motion parameters, several applications are presented, including a novel active SLAM approach which guides the camera in such a way that the SLAM algorithm achieves the desired performance metrics. Real-world and simulated experimental results demonstrate the effectiveness of the proposed design space and its applications.
Ye M, Johns E, Walter B, et al., 2017, An image retrieval framework for real-time endoscopic image retargeting, International Journal of Computer Assisted Radiology and Surgery, Vol: 12, Pages: 1281-1292, ISSN: 1861-6429
PurposeSerial endoscopic examinations of a patient are important for early diagnosis of malignancies in the gastrointestinal tract. However, retargeting for optical biopsy is challenging due to extensive tissue variations between examinations, requiring the method to be tolerant to these changes whilst enabling real-time retargeting.MethodThis work presents an image retrieval framework for inter-examination retargeting. We propose both a novel image descriptor tolerant of long-term tissue changes and a novel descriptor matching method in real time. The descriptor is based on histograms generated from regional intensity comparisons over multiple scales, offering stability over long-term appearance changes at the higher levels, whilst remaining discriminative at the lower levels. The matching method then learns a hashing function using random forests, to compress the string and allow for fast image comparison by a simple Hamming distance metric.ResultsA dataset that contains 13 in vivo gastrointestinal videos was collected from six patients, representing serial examinations of each patient, which includes videos captured with significant time intervals. Precision-recall for retargeting shows that our new descriptor outperforms a number of alternative descriptors, whilst our hashing method outperforms a number of alternative hashing approaches.ConclusionWe have proposed a novel framework for optical biopsy in serial endoscopic examinations. A new descriptor, combined with a novel hashing method, achieves state-of-the-art retargeting, with validation on in vivo videos from six patients. Real-time performance also allows for practical integration without disturbing the existing clinical workflow.
Johns E, Leutenegger S, Davison AJ, 2016, Pairwise Decomposition of Image Sequences for Active Multi-View Recognition, Computer Vision and Pattern Recognition, Publisher: Computer Vision Foundation (CVF), ISSN: 1063-6919
A multi-view image sequence provides a much richercapacity for object recognition than from a single image.However, most existing solutions to multi-view recognitiontypically adopt hand-crafted, model-based geometric methods,which do not readily embrace recent trends in deeplearning. We propose to bring Convolutional Neural Networksto generic multi-view recognition, by decomposingan image sequence into a set of image pairs, classifyingeach pair independently, and then learning an object classi-fier by weighting the contribution of each pair. This allowsfor recognition over arbitrary camera trajectories, withoutrequiring explicit training over the potentially infinite numberof camera paths and lengths. Building these pairwiserelationships then naturally extends to the next-best-viewproblem in an active recognition framework. To achievethis, we train a second Convolutional Neural Network tomap directly from an observed image to next viewpoint.Finally, we incorporate this into a trajectory optimisationtask, whereby the best recognition confidence is sought fora given trajectory length. We present state-of-the-art resultsin both guided and unguided multi-view recognition on theModelNet dataset, and show how our method can be usedwith depth images, greyscale images, or both.
Johns E, Leutenegger S, Davison AJ, 2016, Deep learning a grasp function for grasping under gripper pose uncertainty, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE, Pages: 4461-4468, ISSN: 2153-0866
This paper presents a new method for paralleljawgrasping of isolated objects from depth images, underlarge gripper pose uncertainty. Whilst most approaches aimto predict the single best grasp pose from an image, ourmethod first predicts a score for every possible grasp pose,which we denote the grasp function. With this, it is possibleto achieve grasping robust to the gripper’s pose uncertainty,by smoothing the grasp function with the pose uncertaintyfunction. Therefore, if the single best pose is adjacent to aregion of poor grasp quality, that pose will no longer be chosen,and instead a pose will be chosen which is surrounded by aregion of high grasp quality. To learn this function, we traina Convolutional Neural Network which takes as input a singledepth image of an object, and outputs a score for each grasppose across the image. Training data for this is generated byuse of physics simulation and depth image simulation with 3Dobject meshes, to enable acquisition of sufficient data withoutrequiring exhaustive real-world experiments. We evaluate withboth synthetic and real experiments, and show that the learnedgrasp score is more robust to gripper pose uncertainty thanwhen this uncertainty is not accounted for.
Ye M, Johns E, Walter B, et al., 2016, Robust Image Descriptors for Real-Time Inter-Examination Retargeting in Gastrointestinal Endoscopy, International Conference on Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016, Publisher: Springer, Pages: 448-456, ISSN: 0302-9743
For early diagnosis of malignancies in the gastrointestinaltract, surveillance endoscopy is increasingly used to monitor abnormaltissue changes in serial examinations of the same patient. Despite suc-cesses with optical biopsy forin vivoandin situtissue characterisa-tion, biopsy retargeting for serial examinations is challenging becausetissue may change in appearance between examinations. In this paper, wepropose an inter-examination retargeting framework for optical biopsy,based on an image descriptor designed for matching between endoscopicscenes over significant time intervals. Each scene is described by a hierar-chy of regional intensity comparisons at various scales, offering toleranceto long-term change in tissue appearance whilst remaining discrimina-tive. Binary coding is then used to compress the descriptor via a novelrandom forests approach, providing fast comparisons in Hamming spaceand real-time retargeting. Extensive validation conducted on 13in vivogastrointestinal videos, collected from six patients, show that our ap-proach outperforms state-of-the-art methods.
Johns E, Mac Aodha O, Brostow G, 2015, Becoming the expert - interactive multi-class machine teaching, Conference on Computer Vision and Pattern Recognition 2015, Publisher: Institute of Electrical and Electronics Engineers, ISSN: 1063-6919
Compared to machines, humans are extremely good atclassifying images into categories, especially when theypossess prior knowledge of the categories at hand. If thisprior information is not available, supervision in the formof teaching images is required. To learn categories morequickly, people should see important and representative im-ages first, followed by less important images later – or not atall. However, image-importance is individual-specific, i.e.a teaching image is important to a student if it changes theiroverall ability to discriminate between classes. Further, stu-dents keep learning, so while image-importance depends ontheir current knowledge, it also varies with time.In this work we propose an Interactive Machine Teach-ing algorithm that enables a computer to teach challeng-ing visual concepts to a human. Our adaptive algorithmchooses, online, which labeled images from a teaching setshould be shown to the student as they learn. We show that ateaching strategy that probabilistically models the student’sability and progress, based on their correct and incorrectanswers, produces better ‘experts’. We present results us-ing real human participants across several varied and chal-lenging real-world datasets.
Ye M, Johns E, Giannarou S, et al., 2014, Online Scene Association for Endoscopic Navigation, 17th International Conference MICCAI 2014, Publisher: Springer International Publishing, Pages: 316-323, ISSN: 0302-9743
Endoscopic surveillance is a widely used method for moni-toring abnormal changes in the gastrointestinal tract such as Barrett'sesophagus. Direct visual assessment, however, is both time consumingand error prone, as it involves manual labelling of abnormalities on alarge set of images. To assist surveillance, this paper proposes an onlinescene association scheme to summarise an endoscopic video into scenes,on-the-y. This provides scene clustering based on visual contents, andalso facilitates topological localisation during navigation. The proposedmethod is based on tracking and detection of visual landmarks on thetissue surface. A generative model is proposed for online learning of pair-wise geometrical relationships between landmarks. This enables robustdetection of landmarks and scene association under tissue deformation.Detailed experimental comparison and validation have been conductedon in vivo endoscopic videos to demonstrate the practical value of ourapproach.
Johns E, Yang G-Z, 2014, Generative Methods for Long-Term Place Recognition in Dynamic Scenes, INTERNATIONAL JOURNAL OF COMPUTER VISION, Vol: 106, Pages: 297-314, ISSN: 0920-5691
Ye M, Johns E, Giannarou S, et al., 2014, Online scene association for endoscopic navigation, Pages: 316-323
Endoscopic surveillance is a widely used method for monitoring abnormal changes in the gastrointestinal tract such as Barrett's esophagus. Direct visual assessment, however, is both time consuming and error prone, as it involves manual labelling of abnormalities on a large set of images. To assist surveillance, this paper proposes an online scene association scheme to summarise an endoscopic video into scenes, on-the-fly. This provides scene clustering based on visual contents, and also facilitates topological localisation during navigation. The proposed method is based on tracking and detection of visual landmarks on the tissue surface. A generative model is proposed for online learning of pairwise geometrical relationships between landmarks. This enables robust detection of landmarks and scene association under tissue deformation. Detailed experimental comparison and validation have been conducted on in vivo endoscopic videos to demonstrate the practical value of our approach.
Johns E, Yang G-Z, 2014, Pairwise Probabilistic Voting: Fast Place Recognition without RANSAC, 13th European Conference on Computer Vision (ECCV), Publisher: SPRINGER-VERLAG BERLIN, Pages: 504-519, ISSN: 0302-9743
Johns E, Yang G-Z, 2013, Dynamic Scene Models for Incremental, Long-Term, Appearance-Based Localisation, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 2731-2736, ISSN: 1050-4729
Johns E, Yang G-Z, 2013, Feature Co-occurrence Maps: Appearance-based Localisation Throughout the Day, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 3212-3218, ISSN: 1050-4729
Liu J, Johns E, Atallah L, et al., 2012, An intelligent food-intake monitoring system using wearable sensors, Pages: 154-160
Johns E, Yang G-Z, 2011, Global Localization in a Dense Continuous Topological Map, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 1032-1037, ISSN: 1050-4729
Johns E, Yang G-Z, 2011, Place Recognition and Online Learning in Dynamic Scenes with Spatio-Temporal Landmarks, 22nd British Machine Vision Conference, Publisher: B M V A PRESS
Johns E, Yang G-Z, 2011, From Images to Scenes: Compressing an Image Cluster into a Single Scene Model for Place Recognition, IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 874-881, ISSN: 1550-5499
Liu J, Johns E, Yang G-Z, 2011, A scene-associated training method for mobile robot speech recognition in multisource reverberated environments, Pages: 542-549
Johns E, Yang G-Z, 2010, Scene Association for Mobile Robot Navigation, IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE, ISSN: 2153-0858
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.