Publications

Pimentel J, Moujahid M, Diab M, Demiris Y, McKenna PE, Romeo Met al., 2024, The Role of Visual Perspective Taking in Human-Robot Collaboration, Pages: 1290-1292

We present a video demonstration of our human-robot collaboration experiment investigating visual perspective taking and trust. In the experiment, participants interact with a Vector robot to help it navigate out of a maze. Both robot and participant have a unique view of the maze, with participants viewing it from above and Vector from a point-of-view perspective. The maze itself includes a series of junctions with several route options. After announcing the task objectives, the robot navigates autonomously and uses verbal utterances to communicate route options and route suggestions at each junction. Junction decisions are made after Vector announces the route options, listens to participants' selection, offers a route suggestion, and processes participants' final decision. So, to efficiently navigate to the mazes exit, participants' must assume Vector's visual perspective to understand the direction it is referring to. As outlined in the conclusion, this experiment intended to investigate the role of visual perspective taking on trust dynamics in collaborative human-robot interaction, with a view to supporting further work on robot theory of mind (ToM).

Abstract
Cite

Conference paper

Yarici M, Von Rosenberg W, Hammour G, Davies H, Amadori P, Ling N, Demiris Y, Mandic DPet al., 2024, Hearables: feasibility of recording cardiac rhythms from single in-ear locations., R Soc Open Sci, Vol: 11, ISSN: 2054-5703

The ear is well positioned to accommodate both brain and vital signs monitoring, via so-called hearable devices. Consequently, ear-based electroencephalography has recently garnered great interest. However, despite the considerable potential of hearable based cardiac monitoring, the biophysics and characteristic cardiac rhythm of ear-based electrocardiography (ECG) are not yet well understood. To this end, we map the cardiac potential on the ear through volume conductor modelling and measurements on multiple subjects. In addition, in order to demonstrate real-world feasibility of in-ear ECG, measurements are conducted throughout a long-time simulated driving task. As a means of evaluation, the correspondence between the cardiac rhythms obtained via the ear-based and standard Lead I measurements, with respect to the shape and timing of the cardiac rhythm, is verified through three measures of similarity: the Pearson correlation, and measures of amplitude and timing deviations. A high correspondence between the cardiac rhythms obtained via the ear-based and Lead I measurements is rigorously confirmed through agreement between simulation and measurement, while the real-world feasibility was conclusively demonstrated through efficacious cardiac rhythm monitoring during prolonged driving. This work opens new avenues for seamless, hearable-based cardiac monitoring that extends beyond heart rate detection to offer cardiac rhythm examination in the community.

Journal article

Abeywickrama DB, Bennaceur A, Chance G, Demiris Y, Kordoni A, Levine M, Moffat L, Moreau L, Mousavi MR, Nuseibeh B, Ramamoorthy S, Ringert JO, Wilson J, Windsor S, Eder Ket al., 2023, On Specifying for Trustworthiness, Communications of the ACM, Vol: 67, Pages: 98-109, ISSN: 0001-0782

As autonomous systems increasingly become part of our lives, it is crucial to foster trust between humans and these systems, to ensure positive outcomes and mitigate harmful ones.

Abstract
Cite

Journal article

Luo H, Demiris Y, 2023, Bi-manual robot shoe lacing, 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, ISSN: 2153-0866

Shoe lacing (SL) is a challenging sensorimotor task in daily life and a complex engineering problem in theshoe-making industry. In this paper, we propose a system for autonomous SL. It contains a mathematical definition of the SL task and searches for the best lacing pattern corresponding to the shoe configuration and the user preferences. We propose a set of action primitives and generate plans of action sequences according to the designed pattern. Our system plans the trajectories based on the perceived position of the eyelets and aglets with an active perception strategy, and deploys the trajectories on a bi-manual robot. Experiments demonstrate that the proposed system can successfully lace 3 different shoesin different configurations, with a completion rate of 92.0%,91.6% and 77.5% for 6, 8 and 10-eyelet patterns respectively.To the best of our knowledge, this is the first demonstration of autonomous SL using a bi-manual robot.

Conference paper

Zhang F, Demiris Y, 2023, Visual-tactile learning of garment unfolding for robot-assisted dressing, IEEE Robotics and Automation Letters, Vol: 8, Pages: 5512-5519, ISSN: 2377-3766

Assistive robots have the potential to support disabled and elderly people in daily dressing activities. An intermediate stage of dressing is to manipulate the garment from a crumpled initial state to an unfolded configuration that facilitates robust dressing. Applying quasi-static grasping actions with vision feedback on garment unfolding usually suffers from occluded grasping points. In this work, we propose a dynamic manipulation strategy: tracing the garment edge until the hidden corner is revealed. We introduce a model-based approach, where a deep visual-tactile predictive model iteratively learns to perform servoing from raw sensor data. The predictive model is formalized as Conditional Variational Autoencoder with contrastive optimization, which jointly learns underlying visual-tactile latent representations, a latent garment dynamics model, and future predictions of garment states. Two cost functions are explored: the visual cost, defined by garment corner positions, guarantees the gripper to move towards the corner, while the tactile cost, defined by garment edge poses, prevents the garment from falling from the gripper. The experimental results demonstrate the improvement of our contrastive visual-tactile model predictive control over single sensing modality and baseline model learning techniques. The proposed method enables a robot to unfold back-opening hospital gowns and perform upper-body dressing.

Journal article

Goubard C, Demiris Y, 2023, Cooking up trust: eye gaze and posture for trust-aware action selection in human-robot collaboration, TAS '23: First International Symposium on Trustworthy Autonomous Systems, Publisher: ACM, Pages: 1-5

Conference paper

Kotsovolis S, Demiris Y, 2023, Bi-manual manipulation of multi-component garments towards robot-assisted dressing, 2023 IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE

In this paper, we propose a strategy for robot-assisted dressing with multi-component garments, such as gloves. Most studies in robot-assisted dressing usually experiment with single-component garments, such as sleeves, while multi-component tasks are often approached as sequential single-component problems. In dressing scenarios with more complex garments, robots should estimate the alignment of the human body to the manipulated garments, and revise their dressing strategy. In this paper, we focus on a glove dressing scenario and propose a decision process for selecting dressing action primitives on the different components of the garment, based on a hierarchical representation of the task and a set of environmental conditions. To complement this process, we propose a set of bi-manual control strategies, based on hybrid position, visual, and force feedback, in order to execute the dressing action primitives with the deformable object. The experimental results validate our method, enabling the Baxter robot to dress a mannequin's hand with a gardening glove.

Conference paper

Chacon Quesada R, Demiris Y, 2023, Design and evaluation of an augmented reality head-mounted display user interface for controlling legged manipulators, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 11950-11956, ISSN: 1050-4729

Designing an intuitive User Interface (UI) for controlling assistive robots remains challenging. Most existing UIs leverage traditional control interfaces such as joysticks, hand-held controllers, and 2D UIs. Thus, users have limited availability to use their hands for other tasks. Furthermore, although there is extensive research regarding legged manipulators, comparatively little is on their UIs. Towards extending the state-of-art in this domain, we provide a user study comparing an Augmented Reality (AR) Head-Mounted Display (HMD) UI we developed for controlling a legged manipulator against off-the-shelf control methods for such robots. We made this comparison baseline across multiple factors relevant to a successful interaction. The results from our user study ( N=17 ) show that although the AR UI increases immersion, off-the-shelf control methods outperformed the AR UI in terms of time performance and cognitive workload. Nonetheless, a follow-up pilot study incorporating the lessons learned shows that AR UIs can outpace hand-held-based control methods and reduce the cognitive requirements when designers include hands-free interactions and cognitive offloading principles into the UI.

Conference paper

Candela E, Doustaly O, Parada L, Feng F, Demiris Y, Angeloudis Pet al., 2023, Risk-aware controller for autonomous vehicles using model-based collision prediction and reinforcement learning, ARTIFICIAL INTELLIGENCE, Vol: 320, ISSN: 0004-3702

Journal article

Zhang X, Demiris Y, 2023, Visible and Infrared Image Fusion using Deep Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence

Cite

Journal article

Ovur SE, Demiris Y, 2023, Naturalistic Robot-to-Human Bimanual Handover in Complex Environments Through Multi-Sensor Fusion, IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, ISSN: 1545-5955

Journal article

Bin Razali MH, 2023, Action-conditioned generation of bimanual object manipulation sequences, The 37th AAAI Conference on Artificial Intelligence (AAAI 2023), Publisher: AAAI, Pages: 2146-2154, ISSN: 2159-5399

The generation of bimanual object manipulation sequencesgiven a semantic action label has broad applications in collaborative robots or augmented reality. This relatively new problem differs from existing works that generate whole-body motions without any object interaction as it now requires the model to additionally learn the spatio-temporal relationship that exists between the human joints and object motion given said label. To tackle this task, we leverage the varying degree each muscle or joint is involved during object manipulation. For instance, the wrists act as the prime movers for the objects while the finger joints are angled to provide a firm grip. The remaining body joints are the least involved in that they are positioned as naturally and comfortably as possible. We thus design an architecture that comprises 3 main components: (i) a graph recurrent network that generates the wrist and object motion, (ii) an attention-based recurrent network that estimates the required finger joint angles given the graph configuration, and (iii) a recurrent network that reconstructs the body pose given the locations of the wrist. We evaluate our approach on the KIT Motion Capture and KIT RGBD Bi-manual Manipulation datasets and show improvements over a simplified approach that treats the entire body as a singleentity, and existing whole-body-only methods.

Conference paper

Zhong Y, Zhang F, Demiris Y, 2023, Contrastive self-supervised learning for automated multi-modal dance performance assessment, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE

A fundamental challenge of analyzing human motion is to effectively represent human movements both spatially and temporally. We propose a contrastive self-supervised strategy to tackle this challenge. Particularly, we focus on dancing, which involves a high level of physical and intellectual abilities. Firstly, we deploy Graph and Residual Neural Networks with Siamese architecture to represent the dance motion and music features respectively. Secondly, we apply the InfoNCE loss to contrastively embed the high-dimensional multimedia signals onto the latent space without label supervision. Finally, our proposed framework is evaluated on a multi-modal Dance- Music-Level dataset composed of various dance motions, music, genres and choreographies with dancers of different expertise levels. Experimental results demonstrate the robustness and improvements of our proposed method over 3 baselines and 6 ablation studies across tasks of dance genres, choreographies classification and dancer expertise level assessment.

Conference paper

Zhang X, Angeloudis P, Demiris Y, 2023, Dual-branch Spatio-Temporal Graph Neural Networks for Pedestrian Trajectory Prediction, Pattern Recognition, ISSN: 0031-3203

Cite

Journal article

Ren R, Rajesh MG, Sanchez-Riera J, Lopez-Rodriguez A, Zhang F, Tian Y, Alenya G, Agudo A, Demiris Y, Mikolajczyk K, Moreno-Noguer Fet al., 2023, Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision, Pages: 147-153

Automatically detecting graspable regions from a single depth image is a key ingredient in cloth manipulation. The large variability of cloth deformations has motivated most of the current approaches to focus on identifying specific grasping points rather than semantic parts, as the appearance and depth variations of local regions are smaller and easier to model than the larger ones. However, tasks like cloth folding or assisted dressing require recognizing larger segments, such as semantic edges that carry more information than points. We thus first tackle the problem of fine-grained region detection in deformed clothes using only a depth image. We implement an approach for T-shirts, and define up to 6 semantic regions of varying extent, including edges on the neckline, sleeve cuffs, and hem, plus top and bottom grasping points. We introduce a U-Net based network to segment and label these parts. Our second contribution is concerned with the level of supervision required to train the proposed network. While most approaches learn to detect grasping points by combining real and synthetic annotations, in this work we propose a multilayered Domain Adaptation strategy that does not use any real annotations. We thoroughly evaluate our approach on real depth images of a T-shirt annotated with fine-grained labels, and show that training our network only with synthetic labels and our proposed DA approach yields results competitive with real data supervision.

Abstract
Cite

Conference paper

Amadori PV, Demiris Y, 2023, User-Aware Multi-Level Cognitive Workload Estimation from Multimodal Physiological Signals, IEEE Transactions on Cognitive and Developmental Systems, ISSN: 2379-8920

In this paper, we tackle the problem of human cognitive workload estimation from multimodal physiological signals. Allowing computational assistive systems to infer the cognitive states of users in real-time could provide seamless and more fluid human-machine interaction. We introduce WorkNet, a novel end-to-end sequential deep learning model for user-aware cognitive workload classification in a virtual-reality driving simulator. Our architecture exploits user-awareness to bridge the performance-deployability gap in cognitive workload estimation between personalized and generalized models. WorkNet uses a dual prediction head, where one recognizes whose user the data corresponds to, while the other predicts the current user’s cognitive state. We evaluate WorkNet on a dataset we collected from 20 participants in eight scenarios of different levels of cognitive workload, each induced by either auditory or visual stimuli. WorkNet differentiates with high accuracy among four different cognitive workload levels, independently on the modality of the stimulus. Also, it extracts meaningful features from the input, independently on the modality of the stimulus and on the users. Our analyses show that WorkNet can adapt with limited data, akin to a calibration stage, to perform on a different modality it has been trained on, and on unseen users.

Abstract
Cite

Journal article

Lingg N, Demiris Y, 2023, Beyond Self-Report: A Continuous Trust Measurement Device for HRI, Pages: 2220-2225, ISSN: 1944-9445

Trust is a crucial part of human-robot interactions, and its accurate measurement is a challenging task. We introduce Trusty, a handheld continuous trust level measurement device and investigate its validity by analysing the correlation between its measurements and self-reported trust scores. In a study with 29 participants, we evaluated the effectiveness of the device with an autonomous wheelchair in a mobile navigation task. The participants collaborated with an autonomous wheelchair to deliver packages to predefined checkpoints in an unstructured environment, and the performance of the wheelchair was manipulated to be either under a good-performing condition or a bad-performing condition. Our first finding reveals a notable influence of wheelchair performance on self-reported trust. Participants interacting with a good-performing wheelchair exhibited increased trust levels, as evidenced by higher scores on post-experiment trust questionnaires and verbal self-reported trust measures. Additionally, our study proposes Trusty as a continuous measurement tool for assessing trust during HRI, demonstrating its equivalence to self-report measures and traditional questionnaire scores.

Abstract
Cite

Conference paper

Casado FE, Demiris Y, 2022, Federated learning from demonstration for active assistance to smart wheelchair users, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 9326-9331, ISSN: 2153-0858

Learning from Demonstration (LfD) is a very appealing approach to empower robots with autonomy. Given some demonstrations provided by a human teacher, the robot can learn a policy to solve the task without explicit programming. A promising use case is to endow smart robotic wheelchairs with active assistance to navigation. By using LfD, it is possible to learn to infer short-term destinations anywhere, without the need of building a map of the environment beforehand. Nevertheless, it is difficult to generalize robot behaviors to environments other than those used for training. We believe that one possible solution is learning from crowds, involving a broad number of teachers (the end users themselves) who perform demonstrations in diverse and real environments. To this end, in this work we consider Federated Learning from Demonstration (FLfD), a distributed approach based on a Federated Learning architecture. Our proposal allows the training of a global deep neural network using sensitive local data (images and laser readings) with privacy guarantees. In our experiments we pose a scenario involving different clients working in heterogeneous domains. We show that the federated model is able to generalize and deal with non Independent and Identically Distributed (non-IID) data.

Conference paper

Zolotas M, Demiris Y, 2022, Disentangled sequence clustering for human intention inference, IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE, Pages: 9814-9820, ISSN: 2153-0866

Equipping robots with the ability to infer human intent is a vital precondition for effective collaboration. Most computational approaches towards this objective derive a probability distribution of “intent” conditioned on the robot’s perceived state. However, these approaches typically assumetask-specific labels of human intent are known a priori. To overcome this constraint, we propose the Disentangled Sequence Clustering Variational Autoencoder (DiSCVAE), a clustering framework capable of learning such a distribution of intent in an unsupervised manner. The proposed framework leverages recent advances in unsupervised learning to disentangle latentrepresentations of sequence data, separating time-varying local features from time-invariant global attributes. As a novel extension, the DiSCVAE also infers a discrete variable to form a latent mixture model and thus enable clustering over these global sequence concepts, e.g. high-level intentions. We evaluate the DiSCVAE on a real-world human-robot interaction datasetcollected using a robotic wheelchair. Our findings reveal that the inferred discrete variable coincides with human intent, holding promise for collaborative settings, such as shared control.

Conference paper

Chacon Quesada R, Demiris Y, 2022, Holo-SpoK: Affordance-aware augmented reality control of legged manipulators, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 856-862

Although there is extensive research regarding legged manipulators, comparatively little focuses on their User Interfaces (UIs). Towards extending the state-of-art in this domain, in this work, we integrate a Boston Dynamics(BD) Spot® with a light-weight 7 DoF Kinova® robot arm and a Robotiq® 2F-85 gripper into a legged manipulator. Furthermore, we jointly control the robotic platform using an affordance-aware Augmented Reality (AR) Head-Mounted Display (HMD) UI developed for the Microsoft HoloLens 2. We named the combined platform Holo-SpoK. Moreover, we explain how this manipulator colocalises with the HoloLens 2 for its control through AR. In addition, we present the details of our algorithms for autonomously detecting grasp-ability affordances and for the refinement of the positions obtainedvia vision-based colocalisation. We validate the suitability of our proposed methods with multiple navigation and manipulation experiments. To the best of our knowledge, this is the first demonstration of an AR HMD UI for controlling legged manipulators.

Conference paper

Amadori PV, Fischer T, Wang R, Demiris Yet al., 2022, Predicting secondary task performance: a directly actionable metric for cognitive overload detection, IEEE Transactions on Cognitive and Developmental Systems, Vol: 14, Pages: 1474-1485, ISSN: 2379-8920

In this paper, we address cognitive overload detection from unobtrusive physiological signals for users in dual-tasking scenarios. Anticipating cognitive overload is a pivotal challenge in interactive cognitive systems and could lead to safer shared-control between users and assistance systems. Our framework builds on the assumption that decision mistakes on the cognitive secondary task of dual-tasking users correspond to cognitive overload events, wherein the cognitive resources required to perform the task exceed the ones available to the users. We propose DecNet, an end-to-end sequence-to-sequence deep learning model that infers in real-time the likelihood of user mistakes on the secondary task, i.e., the practical impact of cognitive overload, from eye-gaze and head-pose data. We train and test DecNet on a dataset collected in a simulated driving setup from a cohort of 20 users on two dual-tasking decision-making scenarios, with either visual or auditory decision stimuli. DecNet anticipates cognitive overload events in both scenarios and can perform in time-constrained scenarios, anticipating cognitive overload events up to 2s before they occur. We show that DecNet’s performance gap between audio and visual scenarios is consistent with user perceived difficulty. This suggests that single modality stimulation induces higher cognitive load on users, hindering their decision-making abilities.

Journal article

Dragostinov Y, Harðardóttir D, McKenna PE, Robb D, Nesset B, Ahmad MI, Romeo M, Lim MY, Yu C, Jang Y, Diab M, Cangelosi A, Demiris Y, Hastie H, Rajendran Get al., 2022, Preliminary psychometric scale development using the mixed methods Delphi technique, Methods in Psychology, Vol: 7

This study implemented a Delphi Method; a systematic technique which relies on a panel of experts to achieve consensus, to evaluate which questionnaire items would be the most relevant for developing a new Propensity to Trust scale. Following an initial research team moderation phase, two surveys were administered to academic lecturers, professors and Ph.D. candidates specialising in the fields of either individual differences, human-robot interaction, or occupational psychology. Results from 28 experts produced 33 final questionnaire items that were deemed relevant for evaluating trust. We discuss the importance of content validity when implementing scales, while emphasising the need for more documented scale development processes in psychology. Furthermore, we propose that the Delphi technique could be utilised as an effective and economical method for achieving content validity, while also providing greater scale creation transparency.

Abstract
Cite
Citations: 6

Journal article

Nunes UM, Demiris Y, 2022, Robust Event-Based Vision Model Estimation by Dispersion Minimisation, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Vol: 44, Pages: 9561-9573, ISSN: 0162-8828

Author Web Link
Cite
Citations: 5

Journal article

Zhang X, Angeloudis P, Demiris Y, 2022, ST CrossingPose: a spatial-temporal graph convolutional network for skeleton-based pedestrian crossing intention prediction, IEEE Transactions on Intelligent Transportation Systems, Vol: 23, Pages: 20773-20782, ISSN: 1524-9050

Pedestrian crossing intention prediction is crucial for the safety of pedestrians in the context of both autonomous and conventional vehicles and has attracted widespread interest recently. Various methods have been proposed to perform pedestrian crossing intention prediction, among which the skeleton-based methods have been very popular in recent years. However, most existing studies utilize manually designed features to handle skeleton data, limiting the performance of these methods. To solve this issue, we propose to predict pedestrian crossing intention based on spatial-temporal graph convolutional networks using skeleton data (ST CrossingPose). The proposed method can learn both spatial and temporal patterns from skeleton data, thus having a good feature representation ability. Extensive experiments on a public dataset demonstrate that the proposed method achieves very competitive performance in predicting crossing intention while maintaining a fast inference speed. We also analyze the effect of several factors, e.g., size of pedestrians, time to event, and occlusion, on the proposed method.

Journal article

Zhang X, Feng Y, Angeloudis P, Demiris Yet al., 2022, Monocular visual traffic surveillance: a review, IEEE Transactions on Intelligent Transportation Systems, Vol: 23, Pages: 14148-14165, ISSN: 1524-9050

To facilitate the monitoring and management of modern transportation systems, monocular visual traffic surveillance systems have been widely adopted for speed measurement, accident detection, and accident prediction. Thanks to the recent innovations in computer vision and deep learning research, the performance of visual traffic surveillance systems has been significantly improved. However, despite this success, there is a lack of survey papers that systematically review these new methods. Therefore, we conduct a systematic review of relevant studies to fill this gap and provide guidance to future studies. This paper is structured along the visual information processing pipeline that includes object detection, object tracking, and camera calibration. Moreover, we also include important applications of visual traffic surveillance systems, such as speed measurement, behavior learning, accident detection and prediction. Finally, future research directions of visual traffic surveillance systems are outlined.

Journal article

Al-Hindawi A, Vizcaychipi M, Demiris Y, 2022, Faster, better blink detection through curriculum learning by augmentation, ETRA '22: 2022 Symposium on Eye Tracking Research and Applications, Publisher: ACM, Pages: 1-7

Blinking is a useful biological signal that can gate gaze regression models to avoid the use of incorrect data in downstream tasks. Existing datasets are imbalanced both in frequency of class but also in intra-class difficulty which we demonstrate is a barrier for curriculum learning. We thus propose a novel curriculum augmentation scheme that aims to address frequency and difficulty imbalances implicitly which are are terming Curriculum Learning by Augmentation (CLbA).Using Curriculum Learning by Augmentation (CLbA), we achieve a state-of-the-art performance of mean Average Precision (mAP) 0.971 using ResNet-18 up from the previous state-of-the-art of mean Average Precision (mAP) of 0.757 using DenseNet-121 whilst outcompeting Curriculum Learning by Bootstrapping (CLbB) by a significant margin with improved calibration. This new training scheme thus allows the use of smaller and more performant Convolutional Neural Network (CNN) backbones fulfilling Nyquist criteria to achieve a sampling frequency of 102.3Hz. This paves the way for inference of blinking in real-time applications.

Conference paper

Girbes-Juan V, Schettino V, Gracia L, Solanes JE, Demiris Y, Tornero Jet al., 2022, Combining haptics and inertial motion capture to enhance remote control of a dual-arm robot, JOURNAL ON MULTIMODAL USER INTERFACES, Vol: 16, Pages: 219-238, ISSN: 1783-7677

Author Web Link
Cite
Citations: 4

Journal article

Bin Razali MH, Demiris Y, 2022, Using a single input to forecast human action keystates in everyday pick and place actions, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 3488-3492

We define action keystates as the start or end of an actionthat contains information such as the human pose and time.Existing methods that forecast the human pose use recurrentnetworks that input and output a sequence of poses. In this pa-per, we present a method tailored for everyday pick and placeactions where the object of interest is known. In contrast toexisting methods, ours uses an input from a single timestep todirectly forecast (i) the key pose the instant the pick or placeaction is performed and (ii) the time it takes to get to the pre-dicted key pose. Experimental results show that our methodoutperforms the state-of-the-art for key pose forecasting andis comparable for time forecasting while running at least anorder of magnitude faster. Further ablative studies reveal thesignificance of the object of interest in enabling the total num-ber of parameters across all existing methods to be reduced byat least 90% without any degradation in performance.

Conference paper

Al-Hindawi A, Vizcaychipi MP, Demiris Y, 2022, What is the patient looking at? Robust gaze-scene intersection under free-viewing conditions, 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 2430-2434, ISSN: 1520-6149

Locating the user’s gaze in the scene, also known as Point of Regard (PoR) estimation, following gaze regression is important for many downstream tasks. Current techniques either require the user to wear and calibrate instruments, require significant pre-processing of the scene information, or place restrictions on user’s head movements.We propose a geometrically inspired algorithm that, despite its simplicity, provides high accuracy and O(J) performance under a variety of challenging situations including sparse depth maps, high noise, and high dynamic parallax between the user and the scene camera. We demonstrate the utility of the proposed algorithm in regressing the PoR from scenes captured in the Intensive Care Unit (ICU) at Chelsea & Westminster Hospital NHS Foundation Trust a .

Conference paper

Zhang F, Demiris Y, 2022, Learning garment manipulation policies toward robot-assisted dressing., Science Robotics, Vol: 7, Pages: eabm6010-eabm6010, ISSN: 2470-9476

Assistive robots have the potential to support people with disabilities in a variety of activities of daily living, such as dressing. People who have completely lost their upper limb movement functionality may benefit from robot-assisted dressing, which involves complex deformable garment manipulation. Here, we report a dressing pipeline intended for these people and experimentally validate it on a medical training manikin. The pipeline is composed of the robot grasping a hospital gown hung on a rail, fully unfolding the gown, navigating around a bed, and lifting up the user's arms in sequence to finally dress the user. To automate this pipeline, we address two fundamental challenges: first, learning manipulation policies to bring the garment from an uncertain state into a configuration that facilitates robust dressing; second, transferring the deformable object manipulation policies learned in simulation to real world to leverage cost-effective data generation. We tackle the first challenge by proposing an active pre-grasp manipulation approach that learns to isolate the garment grasping area before grasping. The approach combines prehensile and nonprehensile actions and thus alleviates grasping-only behavioral uncertainties. For the second challenge, we bridge the sim-to-real gap of deformable object policy transfer by approximating the simulator to real-world garment physics. A contrastive neural network is introduced to compare pairs of real and simulated garment observations, measure their physical similarity, and account for simulator parameters inaccuracies. The proposed method enables a dual-arm robot to put back-opening hospital gowns onto a medical manikin with a success rate of more than 90%.

Journal article

Professor Yiannis Demiris

Contact

Location

Summary