Videos

Toward Shared Control for Bi-manual Robotic Wheelchairs

Link to paper

This video supplements the paper "Toward Shared Control for Mobile Bimanual Manipulation on a Robotic Wheelchair", presented at IEEE RO-MAN 2025. Assistance through wheelchair-mounted manipulators has the potential to enhance the independence of individuals with disabilities. However, existing approaches primarily focus on single-arm systems or require extensive user input and demonstrations to infer intentions. In this study, we present a shared control framework for intuitive dual-arm operation through a standard 2D joystick, focusing on pick-and-place tasks. Our approach infers user intent in real-time, eliminating the need for specifying prior goals or beliefs. To address the challenges of controlling two 7-DoF (Kinova Gen3) manipulators, we propose two distinct control methods: one for pre-grasp positioning and one for grasp execution. The first method employs a shared control policy to optimize the pre-grasp positioning of the wheelchair base, ensuring ergonomic alignment for front-grasping tasks while incorporating mobile manipulation to reduce task completion time. The second method allows users to maintain high-level goal control and fine-tuning through task-specific arbitration. Experimental results demonstrate high grasp quality and task efficiency across pick-and-place scenarios, establishing the feasibility of shared control for bimanual manipulation and wheelchair navigation. We believe this is the first unified framework for mobile bimanual manipulation using standard wheelchair controls.

1st vs 3rd-Person Perspective Interfaces for Robot BC

Link to paper

Link to code

This is a complementary video of our ICRA 2025 paper titled: "Interface Matters: Comparing First and Third-Person Perspective Interfaces for Bi-Manual Robot Behavioural Cloning" The video introduces two bimanual teleoperation interfaces we developed with the first and third-person perspectives. Datasets collected from the two interfaces were used to train a policy for shoelace insertion. Experiment clips and result analysis comparing the two interfaces are presented.

3D Eye-Gaze Tracking for Assessing Trust in HRI

Link to paper

This video supplements the paper "An Integrated 3D Eye-Gaze Tracking Framework for Assessing Trust in Human-Robot Interaction," accepted at ACM THRI in March 2025. It presents a novel 3D eye-gaze tracking system using head-mounted displays and spatial tracking to analyse trust in human-robot interactions. A Bayesian model was developed to assess treatment effects and identify gaze features linked to trust. In a study with 25 participants completing inspection tasks with a robot under high vs. low reliability conditions, lower trust was linked to longer fixations, higher fixation/saccade amplitudes, and more fixation transitions. While findings aligned with 2D studies, differences in scan-path length and fixation count were observed. The study highlights the value of integrating multiple gaze metrics for deeper insights into trust.

Benchmarking and Simulating Bimanual Robot Shoe Lacing

Videos for our paper Benchmarking and Simulating Bimanual Robot Shoe Lacing (RA-L 2024)

Link to paper

Link to code

Manipulation of deformable objects is a challenging domain in robotics. Although it has been gaining attention in recent years, long-horizon deformable object manipulation remains largely unexplored. In this letter, we propose a benchmark for the bi-manual Shoe Lacing (SL) task for evaluating and comparing long-horizon deformable object manipulation algorithms. SL is a difficult sensorimotor task in everyday life as well as the shoe manufacturing sector. Due to the complexity of the shoe structure, SL naturally requires sophisticated long-term planning. We provide a rigorous definition of the task and protocols to ensure the repeatability of SL experiments. We present 6 benchmark metrics for quantitatively measuring the ecological validity of approaches towards bi-manual SL. We further provide an open-source simulation environment for training and testing SL algorithms, as well as details of the construction and usage of the environment. We evaluate a baseline solution according to the proposed metrics in both reality and simulation.

Effect of AR Multi-UI and Shared Mental Models on Trust

This is a complementary video of our paper presented at IEEE-ROMAN 2024

This is a complementary video of our paper titled: "On the Effect of Augmented-Reality Multi-User Interfaces and Shared Mental Models on Human-Robot Trust " by Rodrigo Chacon, Fernando E. Casado and Yiannis Demiris {r.chacon-quesada17, f.estevez-casado20, y.demiris}@imperial.ac.uk. The video introduces an Augmented Reality multi-user interface developed at the Personal Robotics Laboratory (www.imperial.ac.uk/PersonalRobotics)to perform a user study (N=37) comparing non-dyadic human-robot interactions with a quadruped robot exhibiting low reliability (Group 3), against dyadic interactions while the robot exhibited high-reliability (Group 1) or low-reliability (Group 2). We made this comparison using validated trust questionnaires relevant to HRI. Our results, obtained via Bayesian data analysis methods, show differences in the distribution of answers between groups 1 and 2. Notably, this difference is smaller between groups 1 and 3, which suggests that the combination of shared mental models and multi-user interfaces holds promise as an effective way to manage and calibrate human-robot trust.

Learning Confidence for Trust in Human-Robot Interaction

Supplementary video for the (Goubard, 2024) ICRA paper.

Full paper here

In Human-Robot Interaction (HRI) scenarios, human factors like trust can greatly impact task performance and interaction quality. Recent research has confirmed that perceived robot proficiency is a major antecedent of trust. By making robots aware of their capabilities, we can allow them to choose when to perform low-confidence actions, thus actively controlling the risk of trust reduction. In this paper, we propose Self-Confidence through Observed Novel Experiences (SCONE), a policy to learn self-confidence from experience using semantic action embeddings. Using an assistive cooking setting, we show that the semantic aspect allows SCONE to learn self-confidence faster than existing approaches, while also achieving promising performance in simple instructions-following. Finally, we share results from a pilot study with 31 participants, showing that such a self-confidence-aware policy increases capability-based human trust.

A Visuo-Tactile Learning Approach for Robot-Assisted Bathing

Supplementary video for the Gu et al. RAL 2024 paper

Robot-assisted bed bathing holds the potential to enhance the quality of life for older adults and individuals with mobility impairments. Yet, accurately sensing the human body in a contact-rich manipulation task remains challenging. To address this challenge, we propose a multimodal sensing approach that perceives the 3D contour of body parts using the visual modality while capturing local contact details using the tactile modality. We employ a Transformer-based imitation learning model to utilize the multimodal information and learn to focus on crucial visuo-tactile task features for action prediction. We demonstrate our approach using a Baxter robot and a medical manikin to simulate the robot-assisted bed bathing scenario with bedridden individuals. The robot adeptly follows the contours of the manikin's body parts and cleans the surface based on its curve. Experimental results show that our method can adapt to nonlinear surface curves and generalize across multiple surface geometries, and to human subjects. Overall, our research presents a promising approach for robots to accurately sense the human body through multimodal sensing and perform safe interaction during assistive bed bathing.

An AR HMD UI for Controlling Legged Manipulators

An AR HMD UI for Legged Manipulators (ACM THRI 2024)

This is a supplementary video for the paper published at ACM Transactions on HRI

The video introduces a user study comparing an AR HMD UI we developed for controlling a legged manipulator against off-the-shelf control methods for such robots.
This user study involved 27 participants and 135 trials, from which we gathered over 405 completed questionnaires.
These trials involved multiple navigation and manipulation tasks with varying difficulty levels using a Boston Dynamics Spot, a 7 DoF Kinova robot arm, and a Robotiq 2F-85 gripper that we integrated into a legged manipulator.
We made the comparison between UIs across multiple dimensions relevant to a successful human-robot interaction.
These dimensions include cognitive workload, technology acceptance, fluency, system usability, immersion and trust.
Our study employed a factorial experimental design with participants undergoing five different conditions, generating longitudinal data.
Due to potential unknown distributions and outliers in such data, using parametric methods for its analysis is questionable, and while non-parametric alternatives exist, they may lead to reduced statistical power.
Therefore, to analyse the data that resulted from our experiment, we chose Bayesian data analysis as an effective alternative to address these limitations.
Our AR HMD UI for the control of legged manipulators was found to improve human-robot interaction across several relevant dimensions, underscoring the critical role of UI design in the effective and trustworthy utilisation of robotic systems.

[IROS '23] Bimanual Robot Shoe Lacing

Complementary video for the paper presented at IROS 2023.

Shoe lacing is a challenging sensorimotor task in daily life and a complex engineering problem in the shoe-making industry. In this paper, we propose a system for autonomous SL. It contains a mathematical definition of the SL task and searches for the best lacing pattern corresponding to the shoe configuration and the user preferences. We propose a set of action primitives and generate plans of action sequences according to the designed pattern. Our system plans the trajectories based on the perceived position of the eyelets and aglets with an active perception strategy, and deploys the trajectories on a bi-manual robot. Experiments demonstrate that the proposed system can successfully lace 3 different shoes in different configurations, with a completion rate of 92.0%, 91.6% and 77.5% for 6, 8 and 10-eyelet patterns respectively. To the best of our knowledge, this is the first demonstration of autonomous SL using a bi-manual robot.

Design and Evaluation of an AR HMD UI for Controlling Legged

Supplementary Video for the paper published in the IEEE ICRA 2023.

Designing an intuitive User Interface (UI) for controlling assistive robots remains challenging. Most existing UIs leverage traditional control interfaces such as joysticks, hand-held controllers, and 2D UIs. Thus, users have limited availability to use their hands for other tasks. Furthermore, although there is extensive research regarding legged manipulators, comparatively little is on their UIs. Towards extending the state-of-art in this domain, we provide a user study comparing an Augmented Reality (AR) Head-Mounted Display (HMD) UI we developed for controlling a legged manipulator against off-the-shelf control methods for such robots. We made this comparison baseline across multiple factors relevant to a successful interaction. The results from our user study (N = 17) show that although the AR UI increases immersion, off-the-shelf control methods outperformed the AR UI in terms of time performance and cognitive workload. Nonetheless, a follow-up pilot study incorporating the lessons learned shows that AR UIs can outpace hand-held-based control methods and reduce the cognitive requirements when designers include hands-free interactions and cognitive offloading principles into the UI.

Holo-SpoK: Affordance-Aware AR Control of Leggged Robots

Complementary video for the paper presented during IROS 2022

Abstract: Although there is extensive research regarding legged manipulators, comparatively little focuses on their User Interfaces (UIs). Towards extending the state-of-art in this domain, in this work, we integrate a Boston Dynamics (BD) Spot® with a light-weight 7 DoF Kinova® robot arm and a Robotiq® 2F-85 gripper into a legged manipulator. Furthermore, we jointly control the robotic platform using an affordance-aware Augmented Reality (AR) Head-Mounted Display (HMD) UI developed for the Microsoft HoloLens 2. We named the combined platform Holo-SpoK. Moreover, we explain how this manipulator colocalises with the HoloLens 2 for its control through AR. In addition, we present the details of our algorithms for autonomously detecting grasp-ability affordances and for the refinement of the positions obtained via vision-based colocalisation. We validate the suitability of our proposed methods with multiple navigation and manipulation experiments. To the best of our knowledge, this is the first demonstration of an AR HMD UI for controlling legged manipulators.

R. Chacon Quesada and Y. Demiris, "Holo-SpoK: Affordance-Aware Augmented Reality Control of Legged Manipulators," 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 2022, pp. 856-862, doi: 10.1109/IROS47612.2022.9981989.

Learning manipulation policies for robot-assisted dressing

Supplementary Video for the paper published in Science Robotics, April 2022

Abstract: Assistive robots have the potential to support people with disabilities in a variety of activities of daily living, such as dressing. People who have completely lost their upper limb movement functionality may benefit from robot-assisted dressing, which involves complex deformable garment manipulation. Here, we report a dressing pipeline intended for these people and experimentally validate it on a medical training manikin. The pipeline is composed of the robot grasping a hospital gown hung on a rail, fully unfolding the gown, navigating around a bed, and lifting up the user’s arms to finally dress the user. To automate this pipeline, we address two fundamental challenges: first, learning manipulation policies to bring the garment from an uncertain state into a configuration that facilitates robust dressing; second, transferring the deformable object manipulation policies learned in simulation to real-world to leverage cost-effective data generation. We tackle the first challenge by proposing an active pre-grasp manipulation approach that learns to isolate the garment grasping area before grasping. The approach combines prehensile and nonprehensile actions and thus alleviates grasping-only behavioural uncertainties. For the second challenge, we bridge the sim-to-real gap of deformable object policy transfer by approximating the simulator to real-world garment physics. A contrastive neural network is introduced to compare pairs of real and simulated garment observations, measure their physical similarity, and account for simulator parameter inaccuracies. The proposed method enables a dual-arm robot to put back-opening hospital gowns onto a medical manikin with a success rate of more than 90%.

F. Zhang and Y. Demiris, "Learning garment manipulation policies toward robot-assisted dressing," in Science Robotics, vol. 7, no. 65, April 2022, doi: 10.1126/scirobotics.abm6010

Proactive Robot Assistance Through Affordance-Aware Augmente

Proactive Robot Assistance Affordance-Aware AR UIs

Supplementary Video for the paper published in the IEEE Robotics and Automation Magazine March 2022.

Abstract: This is a complementary video of our paper titled: "Proactive Robot Assistance Affordance-Aware Augmented Reality User Interfaces" by Rodrigo Chacon and Yiannis Demiris {r.chacon-quesada17, y.demiris}@imperial.ac.uk. The video introduces an Affordance-Aware Object-Oriented Proactive Planning architecture for assistive robotics developed at the Personal Robotics Laboratory (www.imperial.ac.uk/PersonalRobotics). Given sensor information about objects in the environment (the current state), and the available robot capabilities, the architecture proactively generates a number of plans that will achieve potential goals that this environment affords. To ensure that a rich repertoire of plans is considered, we integrated as a source of plans a publicly available large linguistic dataset of goals and high-level instructions of how to achieve them. A language-driven algorithm provides the most relevant and feasible goals given the current context, reducing the time required to generate and present plans, while ensuring than no suitable plans are missed. We validate the proposed architecture in an augmented reality (AR) head-mounted display user interface scenario, with the mobile robot manipulator platform perceiving, analysing and presenting feasible goals as options for its control through AR.

R. Chacon Quesada and Y. Demiris, "Proactive Robot Assistance: Affordance-Aware Augmented Reality User Interfaces," in IEEE Robotics & Automation Magazine, vol. 29, no. 1, pp. 22-34, March 2022, doi: 10.1109/MRA.2021.3136789.

IROS2020 Augmented Reality User Interfaces for Heterogeneous

Supplementary video for the Chacon and Demiris IROS 2020 paper

Abstract: Recent advances in the design of head-mounted augmented reality interfaces for assistive human-robot interaction have allowed untrained users to rapidly and fluently control single-robot platforms. In this paper, we investigate how such interfaces transfer onto multirobot architectures, as several assistive robotics applications need to be distributed among robots that are different both physically and in terms of software. As part of this investigation, we introduce a novel head-mounted AR interface for heterogeneous multirobot control. This interface generates and displays dynamic joint-affordance signifiers, i.e. signifiers that combine and show multiple actions from different robots that can be applied simultaneously to an object. We present a user study with 15 participants analysing the effects of our approach on their perceived fluency. Participants were given the task of filling-out a cup with water making use of a multirobot platform. Our results show a clear improvement in standard HRI fluency metrics when users applied dynamic joint-affordance signifiers, as opposed to a sequence of independent actions.

Conference: IROS 2020

Authors: Rodrigo Chacon and Yiannis Demiris

EMin Framework for Event-based Vision Model Estimation

Supplementary video for the Nunes and Demiris ECCV 2020 paper

Abstract: We propose a novel Entropy Minimisation (EMin) framework for event-based vision model estimation. The framework extends previous event-based motion compensation algorithms to handle models whose outputs have arbitrary dimensions. The main motivation comes from estimating motion from events directly in 3D space (e.g. events augmented with depth), without projecting them onto an image plane. This is achieved by modelling the event alignment according to candidate parameters and minimising the resultant dispersion. We provide a family of suitable entropy loss functions and an efficient approximation whose complexity is only linear with the number of events (e.g. the complexity does not depend on the number of image pixels). The framework is evaluated on several motion estimation problems, including optical flow and rotational motion. As proof of concept, we also test our framework on 6-DOF estimation by performing the optimisation directly in 3D space.

Conference: European Conference on Computer Vision 2020

Authors: Urbano Miguel Nunes and Yiannis Demiris

Learning Grasping Points for Garment Manipulation

Presentation video for the Zhang et al. ICRA 2020 paper

Abstract: Assistive robots have the potential to provide tremendous support for disabled and elderly people in their daily dressing activities. Recent studies on robot-assisted dressing usually simplify the setup of the initial robot configuration by manually attaching the garments on the robot end-effector and positioning them close to the user's arm. A fundamental challenge in automating such a process for robots is computing suitable grasping points on garments that facilitate robotic manipulation. In this paper, we address this problem by introducing a supervised deep neural network to locate a pre-defined grasping point on the garment, using depth images for their invariance to color and texture. To reduce the amount of real data required, which is costly to collect, we leverage the power of simulation to produce large amounts of labeled data. The network is jointly trained with synthetic datasets of depth images and a limited amount of real data. We introduce a robot-assisted dressing system that combines the grasping point prediction method, with a grasping and manipulation strategy which takes grasping orientation computation and robot-garment collision avoidance into account. The experimental results demonstrate that our method is capable of yielding accurate grasping point estimations. The proposed dressing system enables the Baxter robot to autonomously grasp a hospital gown hung on a rail, bring it close to the user and successfully dress the upper-body.

Conference: ICRA 2020

Authors: F. Zhang and Y. Demiris

Learning Grasping Points for Garment Manipulation

Supplementary video for the Zhang et al. ICRA 2020 paper

Abstract: Assistive robots have the potential to provide tremendous support for disabled and elderly people in their daily dressing activities. Recent studies on robot-assisted dressing usually simplify the setup of the initial robot configuration by manually attaching the garments on the robot end-effector and positioning them close to the user's arm. A fundamental challenge in automating such a process for robots is computing suitable grasping points on garments that facilitate robotic manipulation. In this paper, we address this problem by introducing a supervised deep neural network to locate a pre-defined grasping point on the garment, using depth images for their invariance to color and texture. To reduce the amount of real data required, which is costly to collect, we leverage the power of simulation to produce large amounts of labeled data. The network is jointly trained with synthetic datasets of depth images and a limited amount of real data. We introduce a robot-assisted dressing system that combines the grasping point prediction method, with a grasping and manipulation strategy which takes grasping orientation computation and robot-garment collision avoidance into account. The experimental results demonstrate that our method is capable of yielding accurate grasping point estimations. The proposed dressing system enables the Baxter robot to autonomously grasp a hospital gown hung on a rail, bring it close to the user and successfully dress the upper-body.

Conference: ICRA 2020

Authors: F. Zhang and Y. Demiris

RT-BENE: Real-Time Blink Estimation in Natural Environments

Supplementary video for the Cortacero, Fischer, and Demiris ICCV2019 workshop proceedings paper

In recent years gaze estimation methods have made substantial progress, driven by the numerous application areas including human-robot interaction, visual attention estimation and foveated rendering for virtual reality headsets. However, many gaze estimation methods typically assume that the subject's eyes are open; for closed eyes, these methods provide irregular gaze estimates. Here, we address this assumption by first introducing a new open-sourced dataset with annotations of the eye-openness of more than 200,000 eye images, including more than 10,000 images where the eyes are closed. We further present baseline methods that allow for blink detection using convolutional neural networks. In extensive experiments, we show that the proposed baselines perform favourably in terms of precision and recall. We further incorporate our proposed RT-BENE baselines in the recently presented RT-GENE gaze estimation framework where it provides a real-time inference of the openness of the eyes. We argue that our work will benefit both gaze estimation and blink estimation methods, and we take steps towards unifying these methods.

Conference: ICCV 2019 Workshop on Gaze Estimation and Prediction in the Wild (full proceedings paper)

Authors: K. Cortacero, T. Fischer, and Y. Demiris

Towards Explainable Shared Control using Augmented Reality

Supplementary video for the Zolotas & Demiris IROS2019 paper

Abstract: Shared control plays a pivotal role in establishing effective human-robot interactions. Traditional control-sharing methods strive to complement a human’s capabilities at safely completing a task, and thereby rely on users forming a mental model of the expected robot behaviour. However, these methods can often bewilder or frustrate users whenever their actions do not elicit the intended system response, forming a misalignment between the respective internal models of the robot and human. To resolve this model misalignment, we introduce Explainable Shared Control as a paradigm in which assistance and information feedback are jointly considered. Augmented reality is presented as an integral component of this paradigm, by visually unveiling the robot’s inner workings to human operators. Explainable Shared Control is instantiated and tested for assistive navigation in a setup involving a robotic wheelchair and a Microsoft HoloLens with add-on eye tracking. Experimental results indicate that the introduced paradigm facilitates transparent assistance by improving recovery times from adverse events associated with model misalignment.

Conference: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019)

Authors: M. Zolotas and Y. Demiris

Multimodal representation models for prediction and control

Supplementary video for the Zambelli et al. RAS paper

Abstract: Similar to humans, robots benefit from interacting with their environment through a number of different sensor modalities, such asvision, touch, sound. However, learning from different sensor modalities is difficult, because the learning model must be able tohandle diverse types of signals, and learn a coherent representation even when parts of the sensor inputs are missing. In this paper,a multimodal variational autoencoder is proposed to enable an iCub humanoid robot to learn representations of its sensorimotorcapabilities from different sensor modalities. The proposed model is able to (1) reconstruct missing sensory modalities, (2) predictthe sensorimotor state of self and the visual trajectories of other agents actions, and (3) control the agent to imitate an observedvisual trajectory. Also, the proposed multimodal variational autoencoder can capture the kinematic redundancy of the robot motionthrough the learned probability distribution. Training multimodal models is not trivial due to the combinatorial complexity givenby the possibility of missing modalities. We propose a strategy to train multimodal models, which successfully achieves improvedperformance of different reconstruction models. Finally, extensive experiments have been carried out using an iCub humanoidrobot, showing high performance in multiple reconstruction, prediction and imitation tasks.

Journal: Robotics and Autonomous Systems

Authors: M. Zambelli, A. Cully and Y. Demiris

Online Unsupervised 3D 3D Kinematic Structure Learning

Online Unsupervised 3D Kinematic Structure Learning

Supplementary video for the Nunes and Demiris ICCV paper

Abstract: This work addresses the problem of 3D kinematic structure learning of arbitrary articulated rigid bodies from RGB-D data sequences. Typically, this problem is addressed by offline methods that process a batch of frames, assuming that complete point trajectories are available. However, this approach is not feasible when considering scenarios that require continuity and fluidity, for instance, human-robot interaction. In contrast, we propose to tackle this problem in an online unsupervised fashion, by recursively maintaining the metric distance of the scene's 3D structure, while achieving real-time performance. The influence of noise is mitigated by building a similarity measure based on a linear embedding representation and incorporating this representation into the original metric distance. The kinematic structure is then estimated based on a combination of implicit motion and spatial properties. The proposed approach achieves competitive performance both quantitatively and qualitatively in terms of estimation accuracy, even compared to offline methods.

Conference: IEEE International Conference on Computer Vision 2019

Authors: Urbano Miguel Nunes and Yiannis Demiris

Probabilistic Real-Time User Posture Tracking for Dressing

Supplementary video for the Zhang et al. TRO paper

Abstract: Robotic solutions to dressing assistance have the potential to provide tremendous support for elderly and disabled people. However, unexpected user movements may lead to dressing failures or even pose a risk to the user. Tracking such user movements with vision sensors is challenging due to severe visual occlusions created by the robot and clothes. We propose a probabilistic tracking method using Bayesian networks in latent spaces, which fuses robot end-effector positions and force information to enable camera-less and real-time estimation of the user postures during dressing. The latent spaces are created before dressing by modeling the user movements with a Gaussian Process Latent Variable Model, taking the user's movement limitations into account. We introduce a robot-assisted dressing system that combines our tracking method with hierarchical multi-task control to minimize the force between the user and the robot. The experimental results demonstrate the robustness and accuracy of our tracking method. The proposed method enables the Baxter robot to provide personalized dressing assistance in putting on a sleeveless jacket for users with (simulated) upper-body impairments.

Journal: IEEE Transactions on Robotics

Authors: F. Zhang, A. Cully and Y. Demiris

LibRob - Human-Centered Robotics 2018

LibRob is a librarian assistant that guides you through the library and helps you find the book you are looking for. It has been developed as a part of the Human-Centered Robotics course at Imperial College London.

Team: Costanza Di Veroli, Cao An Le, Thibaud Lemaire, Eliot Makabu, Abdullahi Nur, Vincent Ooi, Jee Yong Park, Federico Sanna

Supervisor: Professor Yiannis Demiris

instruMentor - Human-Centered Robotics 2018

Introducing the musical instrument tutor robot - instruMentor! Another brilliant project of our Human-Centered Robotics course (2018-19).

Imperial College Personal Robotics Lab Christmas Video

Personalised Santa robotic dressing & sharedcontrol robotic wheelchairs with AugmentedReality reindeer predictive path visualisations? The Imperial_PRL Christmas video has it all! Merry Christmas everyone!

Head-Mounted Augmented Reality for Wheelchairs

Supplementary video for the Zolotas et al. IROS2018 paper

Abstract: Robotic wheelchairs with built-in assistive features, such as shared control, are an emerging means of providing independent mobility to severely disabled individuals. However, patients often struggle to build a mental model of their wheelchair's behaviour under different environmental conditions. Motivated by the desire to help users bridge this gap in perception, we propose a novel augmented reality system using a Microsoft Hololens as a head-mounted aid for wheelchair navigation. The system displays visual feedback to the wearer as a way of explaining the underlying dynamics of the wheelchair's shared controller and its predicted future states. To investigate the influence of different interface design options, a pilot study was also conducted. We evaluated the acceptance rate and learning curve of an immersive wheelchair training regime, revealing preliminary insights into the potential beneficial and adverse nature of different augmented reality cues for assistive navigation. In particular, we demonstrate that care should be taken in the presentation of information, with effort-reducing cues for augmented information acquisition (for example, a rear-view display) being the most appreciated.

Conference: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018)

Authors: M. Zolotas, J. Elsdon and Y. Demiris

RT-GENE: Real-Time Gaze Estimation in Natural Environments

Supplementary video for the Fischer, Chang and Demiris ECCV2018 paper

In this work, we consider the problem of robust gaze estima- tion in natural environments. Large camera-to-subject distances and high variations in head pose and eye gaze angles are common in such environ- ments. This leads to two main shortfalls in state-of-the-art methods for gaze estimation: hindered ground truth gaze annotation and diminished gaze estimation accuracy as image resolution decreases with distance. We first record a novel dataset of varied gaze and head pose images in a natural environment, addressing the issue of ground truth annotation by measuring head pose using a motion capture system and eye gaze using mobile eyetracking glasses. We apply semantic image inpainting to the area covered by the glasses to bridge the gap between training and testing images by removing the obtrusiveness of the glasses. We also present a new real-time algorithm involving appearance-based deep convolutional neural networks with increased capacity to cope with the diverse images in the new dataset. Experiments with this network architecture are con- ducted on a number of diverse eye-gaze datasets including our own, and in cross dataset evaluations. We demonstrate state-of-the-art performance in terms of estimation accuracy in all experiments, and the architecture performs well even on lower resolution images.

Conference: European Conference on Computer Vision (ECCV2018)

Authors: T. Fischer, H. J. Chang, and Y. Demiris

Transferring Visuomotor Learning from Simulation

Supplementary video for the Nguyen et al. IROS2018 paper

Hand-eye coordination is a requirement for many manipulation tasks including grasping and reaching. However, accurate hand-eye coordination has shown to be especially difficult to achieve in complex robots like the iCub humanoid. In this work, we solve the hand-eye coordination task using a visuomotor deep neural network predictor that estimates the arm’s joint configuration given a stereo image pair of the arm and the underlying head configuration. As there are various unavoidable sources of sensing error on the physical robot, we train the predictor on images obtained from simulation. The images from simulation were modified to look realistic using an image-to-image translation approach. In various experiments, we first show that the visuomotor predictor provides accurate joint estimates of the iCub’s hand in simulation. We then show that the predictor can be used to obtain the systematic error of the robot’s joint measurements on the physical iCub robot. We demonstrate that a calibrator can be designed to automatically compensate this error. Finally, we validate that this enables accurate reaching of objects while circumventing manual fine-calibration of the robot.

Conference: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018)

Authors: P. Nguyen, T. Fischer, H. J. Chang, U. Pattacini, G. Metta, and Y. Demiris

User Modelling Using Multimodal Information for Dressing

Supplementary video for the Gao, Chang, and Demiris

Human-Robot Interaction with DAC-H3 cognitive architecture

Supplementary video for the Moulin-Frier, Fischer et al. TCDS2017 paper

The robot is self-regulating two drives for knowledge acquisition and expression. Acquired information is about labeling the perceived objects, agents and body parts, as well as associating body part touch and motor information. In addition, a number of goal-oriented behaviors are executed through human requests: passing objects, showing the learned kinematic structure, recognizing actions, pointing to the human body parts. A complex narrative dialog about the robot's past experiences is also demonstrated at the end of the video.

Journal: IEEE Transactions on Cognitive and Developmental Systems, 2017

Authors: C. Moulin-Frier*, T. Fischer*, M. Petit, G. Pointeau, J.-Y. Puigbo, U. Pattacini, S. C. Low, D. Camilleri, P. Nguyen, M. Hoffmann, H. J. Chang, M. Zambelli, A.-L. Mealier, A. Damianou, G. Metta, T. J. Prescott, Y. Demiris, P. F. Dominey, and P. F. M. J. Verschure (*: equal contributions)

URL: http://hdl.handle.net/10044/1/50801

Personalized Dressing using User Modeling in Latent Spaces

Supplementary video for the Zhang, Cully, and Demiris IROS 2017 paper

Abstract: Robots have the potential to provide tremendous support to disabled and elderly people in their everyday tasks, such as dressing. Many recent studies on robotic dressing assistance usually view dressing as a trajectory planning problem. However, the user movements during the dressing process are rarely taken into account, which often leads to the failures of the planned trajectory and may put the user at risk. The main difficulty of taking user movements into account is caused by severe occlusions created by the robot, the user, and the clothes during the dressing process, which prevent vision sensors from accurately detecting the postures of the user in real time. In this paper, we address this problem by introducing an approach that allows the robot to automatically adapt its motion according to the force applied on the robot's gripper caused by user movements. There are two main contributions introduced in this paper: 1) the use of a hierarchical multi-task control strategy to automatically adapt the robot motion and minimize the force applied between the user and the robot caused by user movements; 2) the online update of the dressing trajectory based on the user movement limitations modeled with the Gaussian Process Latent Variable Model in a latent space, and the density information extracted from such latent space. The combination of these two contributions leads to a personalized dressing assistance that can cope with unpredicted user movements during the dressing while constantly minimizing the force that the robot may apply on the user. The experimental results demonstrate that the proposed method allows the Baxter humanoid robot to provide personalized dressing assistance for human users with simulated upper-body impairments.

Conference: IROS 2017
Authors: Fan Zhang, Antoine Cully, Yiannis Demiris

Attentional Network for Adaptive Visual Tracking

Supplementary video for the Choi et al. CVPR2017 paper

Title: Attentional Correlation Filter Network for Adaptive Visual Tracking

We propose a new tracking framework with an attentional mechanism that chooses a subset of the associated correlation filters for increased robustness and computational efficiency. The subset of filters is adaptively selected by a deep attentional network according to the dynamic properties of the tracking target. Our contributions are manifold, and are summarised as follows: (i) Introducing the Attentional Correlation Filter Network which allows adaptive tracking of dynamic targets. (ii) Utilising an attentional network which shifts the attention to the best candidate modules, as well as predicting the estimated accuracy of currently inactive modules. (iii) Enlarging the variety of correlation filters which cover target drift, blurriness, occlusion, scale changes, and flexible aspect ratio. (iv) Validating the robustness and efficiency of the attentional mechanism for visual tracking through a number of experiments. Our method achieves similar performance to non real-time trackers, and state-of-the-art performance amongst real-time trackers.

Conference: CVPR2017
Authors: Jongwon Choi, Hyung Jin Chang, Sangdoo Yun, Tobias Fischer, Yiannis Demiris, and Jin Young Choi

Adaptive User Model in Car Racing Games

This video shows our framework for Adaptive User Modelling in Car Racing Games. It shows the sequent

This video shows our framework for Adaptive User Modelling in Car Racing Games. It shows the sequential steps of the model, the simulator as well as the steps carried out to implement the User Model.

Action-Conditioned Generation of Bimanual Sequences

Additional video results for the AAAI 2023 paper

We present a novel method that receives an action label to generate a motion sequence of a person performing bimanual object manipulation. Published at AAAI 2023.

Assisted Painting of 3D Structures Using Shared Control

Assisted Painting of 3D Structures Using Shared Control with Under-actuated Robots

"Assisted Painting of 3D Structures Using Shared Control with Under-actuated Robots", ICRA 2017.

Authors: J. Elsdon and Y. Demiris.

Personalised Track Design in Car Racing Games

Video shows a short demo of the track changing algorithm that creates a personalised track according

Real-time adaptation of computer games’ content to the users’ skills and abilities can enhance the player’s engagement and immersion. Understanding of the user’s potential while playing is of high importance in order to allow the successful procedural generation of user-tailored content. We investigate how player models can be created in car racing games. Our user model uses a combination of data from unobtrusive sensors, while the user is playing a car racing simulator. It extracts features through machine learning techniques, which are then used to comprehend the user’s gameplay, by utilising the educational theoretical frameworks of the Concept of Flow and Zone of Proximal Development. The end result is to provide at a next stage a new track that fits to the user needs, which aids both the training of the driver and their engagement in the game. In order to validate that the system is designing personalised tracks, we associated the average performance from 41 users that played the game, with the difficulty factor of the generated track. In addition, the variation in paths of the implemented tracks between users provides a good indicator for the suitability of the system.

Conference: CIG 2016
Title: Personalised Track Design in Car Racing Games
Authors: Theodosis Georgiou and Yiannis Demiris

Supporting Article : https://spiral.imperial.ac.uk/handle/10044/1/39560

Multimodal Imitation Using Self-Learned Sensorimotor Repr.

Supplementary video for the Zambelli and Demiris IROS2016 paper

Although many tasks intrinsically involve multiple modalities, often only data from a single modality are used to improve complex robots acquisition of new skills. We present a method to equip robots with multimodal learning skills to achieve multimodal imitation on-the-fly on multiple concurrent task spaces, including vision, touch and proprioception, only using self-learned multimodal sensorimotor relations, without the need of solving inverse kinematic problems or explicit analytical models formulation. We evaluate the proposed method on a humanoid iCub robot learning to interact with a piano keyboard and imitating a human demonstration. Since no assumptions are made on the kinematic structure of the robot, the method can be also applied to different robotic platforms.

Conference: IROS2016
Authors: Martina Zambelli and Yiannis Demiris

Iterative Path Optimisation for Dressing Assistance

Supplementary video for the Gao, Chang, and Demiris IROS2016 paper

We propose an online iterative path optimisation method to enable a Baxter humanoid robot to assist human users to dress. The robot searches for the optimal personalised dressing path using vision and force sensor information: vision information is used to recognise the human pose and model the movement space of upper-body joints; force sensor information is used for the robot to detect external force resistance and to locally adjust its motion. We propose a new stochastic path optimisation method based on adaptive moment estimation. We first compare the proposed method with other path optimisation algorithms on synthetic data. Experimental results show that the performance of the method achieves the smallest error with fewer iterations and less computation time. We also evaluate real-world data by enabling the Baxter robot to assist real human users with their dressing.

Conference: IROS2016
Authors: Yixing Gao, Hyung Jin Chang, Yiannis Demiris

Kinematic Structure Correspondences via Hypergraph Matching

Supplementary video for the Chang, Fischer, Petit, Zambelli and Demiris CVPR2016 paper

In this paper, we present a novel framework for finding the kinematic structure correspondence between two objects in videos via hypergraph matching. In contrast to prior appearance and graph alignment based matching methods which have been applied among two similar static images, the proposed method finds correspondences between two dynamic kinematic structures of heterogeneous objects in videos.
Our main contributions can be summarised as follows:
(i) casting the kinematic structure correspondence problem into a hypergraph matching problem, incorporating multi-order similarities with normalising weights
(ii) structural topology similarity measure by a new topology constrained subgraph isomorphism aggregation
(iii) kinematic correlation measure between pairwise nodes
(iv) combinatorial local motion similarity measure using geodesic distance on the Riemannian manifold.
We demonstrate the robustness and accuracy of our method through a number of experiments on synthetic and real data, showing that various other methods are outperformed.

Conference: CVPR2016
Authors: Hyung Jin Chang, Tobias Fischer, Maxime Petit, Martina Zambelli, Yiannis Demiris

Visual Tracking Using Attention-Modulated Disintegration and

Supplementary video for the Choi et al. CVPR2016 paper

In this paper, we present a novel attention-modulated visual tracking algorithm that decomposes an object into multiple cognitive units, and trains multiple elementary trackers in order to modulate the distribution of attention according to various feature and kernel types. In the integration stage it recombines the units to memorize and recognize the target object effectively. With respect to the elementary trackers, we present a novel attentional feature-based correlation filter (AtCF) that focuses on distinctive attentional features. The effectiveness of the proposed algorithm is validated through experimental comparison with state-of-theart methods on widely-used tracking benchmark datasets.

Conference: CVPR2016

Authors: J. Choi, H. J. Chang, J. Jeong, Y. Demiris, and J. Y. Choi

Markerless Perspective Taking for Humanoid Robots

Supplementary video for the Fischer and Demiris ICRA2016 paper

Perspective taking enables humans to imagine the world from another viewpoint. This allows reasoning about the state of other agents, which in turn is used to more accurately predict their behavior. In this paper, we equip an iCub humanoid robot with the ability to perform visuospatial perspective taking (PT) using a single depth camera mounted above the robot. Our approach has the distinct benefit that the robot can be used in unconstrained environments, as opposed to previous works which employ marker-based motion capture systems. Prior to and during the PT, the iCub learns the environment, recognizes objects within the environment, and estimates the gaze of surrounding humans. We propose a new head pose estimation algorithm which shows a performance boost by normalizing the depth data to be aligned with the human head. Inspired by psychological studies, we employ two separate mechanisms for the two different types of PT. We implement line of sight tracing to determine whether an object is visible to the humans (level 1 PT). For more complex PT tasks (level 2 PT), the acquired point cloud is mentally rotated, which allows algorithms to reason as if the input data was acquired from an egocentric perspective. We show that this can be used to better judge where object are in relation to the humans. The multifaceted improvements to the PT pipeline advance the state of the art, and move PT in robots to markerless, unconstrained environments.

Hierarchical Action Learning by Instruction

Supplementary video for the Petit and Demiris ICRA2016 paper

This video accompanies the paper titled "Hierarchical Action Learning by Instruction Through Interactive Grounding of Body Parts and Proto-actions" presented at IEEE International Conference on Robotics and Automation 2016.

One-shot Learning of Assistance by Demonstration

Supplementary video for our ROMAN 2015 paper

Supplementary video for Kucukyilmaz A, Demiris Y, 2015, "One-shot assistance estimation from expert demonstrations for a shared control wheelchair system", International Symposium on Robot and Human Interactive Communication (RO-MAN). More information can be found in the paper.

Personalised Dressing Assistance by Humanoid Robots

Supplementary video for our IROS 2015 paper

Supplementary video for Gao Y, Chang HJ, Demiris Y, 2015, "User Modelling for Personalised Dressing Assistance by Humanoid Robots", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). More information can be found in the paper.

Lifelong Augmentation of Multi Modal Streaming Memories

Supplementary video for the Petit, Fischer and Demiris TCDS2016 paper

Many robotics algorithms can benefit from storing and recalling large amounts of accumulated sensorimotor and interaction data. We provide a principled framework for the cumulative organisation of streaming autobiographical data so that data can be continuously processed and augmented as the processing and reasoning abilities of the agent develops and further interactions with humans take place. As an example, we show how a kinematic structure learning algorithm reasons a-posteriori about the skeleton of a human hand. A partner can be asked to provide feedback about the augmented memories, which can in turn be supplied to the reasoning processes in order to adapt their parameters. We employ active, multi- modal remembering, so the robot as well as humans can gain insights of both the original and augmented memories. Our framework is capable of storing discrete and continuous data in real-time, and thus creates a full memory. The data can cover multiple modalities and several layers of abstraction (e.g. from raw sound signals over sentences to extracted meanings). We show a typical interaction with a human partner using an iCub humanoid robot. The framework is implemented in a platform-independent manner. In particular, we validate multi platform capabilities using the iCub, Baxter and NAO robots. We also provide an interface to cloud based services, which allow automatic annotation of episodes. Our framework is geared towards the developmental robotics community, as it 1) provides a variety of interfaces for other modules, 2) unifies previous works on autobiographical memory, and 3) is licensed as open source software.

Journal: Transactions on Cognitive and Developmental Systems

Authors: M. Petit*, T. Fischer* and Y. Demiris (*: equal contributions)

Unsupervised Complex Kinematic Structure Learning

Supplementary video of our CVPR 2015 paper

Supplementary video of Chang HJ, Demiris Y, 2015, "Unsupervised Learning of Complex Articulated Kinematic Structures combining Motion and Skeleton Information", IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Find more information in the paper.

Online Heterogeneous Ensemble Learning

Online Heterogeneous Ensemble Learning of Sensorimotor Contingencies from Motor Babbling

Forward models play a key role in cognitive agents by providing predictions of the sensory consequences of motor commands, also known as sensorimotor contingencies (SMCs). In continuously evolving environments, the ability to anticipate is fundamental in distinguishing cognitive from reactive agents, and it is particularly relevant for autonomous robots, that must be able to adapt their models in an online manner. Online learning skills, high accuracy of the forward models and multiple-step-ahead predictions are needed to enhance the robots’ anticipation capabilities. We propose an online heterogeneous ensemble learning method for building accurate forward models of SMCs relating motor commands to effects in robots’ sensorimotor system, in particular considering proprioception and vision. Our method achieves up to 98% higher accuracy both in short and long term predictions, compared to single predictors and other online and offline homogeneous ensembles. This method is validated on two different humanoid robots, namely the iCub and the Baxter.

Musical Human-Robot Collaboration with Baxter

This video shows our framework for adaptive musical human-robot collaboration

This video shows our framework for adaptive musical human-robot collaboration. Baxter is in charge of the drum accompaniment and is learning the preferences of the user, who is in charge of the melody. For more information read:Sarabia M, Lee K, Demiris Y, 2015, "Towards a Synchronised Grammars Framework for Adaptive Musical Human-Robot Collaboration", IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Publisher: IEEE, Pages: 715-721.

Assistive Robotic Technology for Hospital Patients

Junior spent a week keeping company many patients at the Chelsea & Westminster hospital

A NAO humanoid robot, Junior, spent a week keeping company many patients at the Chelsea & Westminster Hospital in one of the largest trials of its kind in the world. Our results show that patients really enjoyed interacting with the robot.

The Online Echo State Gaussian Process (OESGP)

A video demonstrating the Online Echo State Gaussian Process (OESGP) for temporal learning

A video demonstrating the Online Echo State Gaussian Process (OESGP) for temporal learning and prediction. Find out more at: http://haroldsoh.com/otl-library/
Code available at: https://bitbucket.org/haroldsoh/otl/

ARTY Nao Sidekick Imperial Festival

Here the ARTY wheelchair integrated with NAO is presented at the annual Imperial Festival, where children used the system.

ARTY NAO Experiment

A Humanoid Robot Companion for Wheelchair Users

This video shows the ARTY wheelchair integrated with a humanoid robot (NAO). The humanoid companion acts as a driving aid by pointing out obstacles and giving directions to the wheelchair user. More information at: Sarabia M, Demiris Y, 2013, "A Humanoid Robot Companion for Wheelchair Users", International Conference on Social Robotics (ICSR), Publisher: Springer, Pages: 432-441

HAMMER on iCub: Towards Contextual Action Recognition

"Towards Contextual Action Recognition and Target Localization with Active Allocation of Attention"

Dimitri Ognibene, Eris Chinellato, Miguel Sarabia and Yiannis Demiris, "Towards Contextual Action Recognition and Target Localization with Active Allocation of Attention", Conference on Biomimetic and Biohybrid Systems, 2012

iCub Learning and Playing the Towers of Hanoi Puzzle

"Learning Reusable Task Representations using Hierarchical Activity Grammars with Uncertainties"

Kyuhwa Lee, Tae-Kyun Kim and Yiannis Demiris, "Learning Reusable Task Representations using Hierarchical Activity Grammars with Uncertainties", IEEE International Conference on Robotics and Automation (ICRA), St. Paul, USA, 2012.

iCub Learning Human Dance Structures for Imitation

The iCub shows off its dance moves

Kyuhwa Lee, Tae-Kyun Kim and Yiannis Demiris, "Learning Reusable Task Representations using Hierarchical Activity Grammars with Uncertainties", IEEE International Conference on Robotics and Automation (ICRA), St. Paul, USA, 2012

iCub Grasping Demonstration

A demonstration of the iCub grasping mechanism

Yanyu Su, Yan Wu, Kyuhwa Lee, Zhijiang Du, Yiannis Demiris, "Robust Grasping Mechanism for an Under-actuated Anthropomorphic Hand under Object Position Uncertainty", IEEE-RAS International Conference on Humanoid Robots, Osaka, Japan, 2012.

iCub playing the Theremin

The iCub humanoid robot plays one of the most difficult musical instruments

The iCub humanoid robot plays the Theremin, one of the most difficult musical instrument, in real-time.

ARTY Smart Wheelchair

Helping young children safely use a wheelchair

The Assistive Robotic Transport for Youngsters (ARTY) is a smart wheelchair designed to help young children with disabilities who are unable to safely use a regular powered wheelchair. It is our hope that ARTY will give users an opportunity to independently explore, learn and play.

Videos from the lab

Toward Shared Control for Bi-manual Robotic Wheelchairs

Toward Shared Control for Bi-manual Robotic Wheelchairs

1st vs 3rd-Person Perspective Interfaces for Robot BC

3D Eye-Gaze Tracking for Assessing Trust in HRI

Benchmarking and Simulating Bimanual Robot Shoe Lacing

Effect of AR Multi-UI and Shared Mental Models on Trust

Learning Confidence for Trust in Human-Robot Interaction

A Visuo-Tactile Learning Approach for Robot-Assisted Bathing

An AR HMD UI for Legged Manipulators (ACM THRI 2024)

[IROS '23] Bimanual Robot Shoe Lacing

Design and Evaluation of an AR HMD UI for Controlling Legged

Holo-SpoK: Affordance-Aware AR Control of Leggged Robots

Learning manipulation policies for robot-assisted dressing

Proactive Robot Assistance Affordance-Aware AR UIs

IROS2020 Augmented Reality User Interfaces for Heterogeneous

EMin Framework for Event-based Vision Model Estimation

Learning Grasping Points for Garment Manipulation

Learning Grasping Points for Garment Manipulation

RT-BENE: Real-Time Blink Estimation in Natural Environments

Towards Explainable Shared Control using Augmented Reality

Multimodal representation models for prediction and control

Online Unsupervised 3D Kinematic Structure Learning

Probabilistic Real-Time User Posture Tracking for Dressing

LibRob - Human-Centered Robotics 2018

instruMentor - Human-Centered Robotics 2018

Imperial College Personal Robotics Lab Christmas Video

Head-Mounted Augmented Reality for Wheelchairs

RT-GENE: Real-Time Gaze Estimation in Natural Environments

Transferring Visuomotor Learning from Simulation

User Modelling Using Multimodal Information for Dressing

Human-Robot Interaction with DAC-H3 cognitive architecture

Personalized Dressing using User Modeling in Latent Spaces

Attentional Network for Adaptive Visual Tracking

Adaptive User Model in Car Racing Games

Action-Conditioned Generation of Bimanual Sequences

Assisted Painting of 3D Structures Using Shared Control

Personalised Track Design in Car Racing Games

Multimodal Imitation Using Self-Learned Sensorimotor Repr.

Iterative Path Optimisation for Dressing Assistance

Kinematic Structure Correspondences via Hypergraph Matching

Visual Tracking Using Attention-Modulated Disintegration and

Markerless Perspective Taking for Humanoid Robots

Hierarchical Action Learning by Instruction

One-shot Learning of Assistance by Demonstration

Personalised Dressing Assistance by Humanoid Robots

Lifelong Augmentation of Multi Modal Streaming Memories

Unsupervised Complex Kinematic Structure Learning

Online Heterogeneous Ensemble Learning

Musical Human-Robot Collaboration with Baxter

Assistive Robotic Technology for Hospital Patients

The Online Echo State Gaussian Process (OESGP)

ARTY Nao Sidekick Imperial Festival

ARTY NAO Experiment

HAMMER on iCub: Towards Contextual Action Recognition

iCub Learning and Playing the Towers of Hanoi Puzzle

iCub Learning Human Dance Structures for Imitation

iCub Grasping Demonstration

iCub playing the Theremin

ARTY Smart Wheelchair

­