Publications
263 results found
Zhang F, Demiris Y, 2023, Visual-tactile learning of garment unfolding for robot-assisted dressing, IEEE Robotics and Automation Letters, Vol: 8, Pages: 5512-5519, ISSN: 2377-3766
Assistive robots have the potential to support disabled and elderly people in daily dressing activities. An intermediate stage of dressing is to manipulate the garment from a crumpled initial state to an unfolded configuration that facilitates robust dressing. Applying quasi-static grasping actions with vision feedback on garment unfolding usually suffers from occluded grasping points. In this work, we propose a dynamic manipulation strategy: tracing the garment edge until the hidden corner is revealed. We introduce a model-based approach, where a deep visual-tactile predictive model iteratively learns to perform servoing from raw sensor data. The predictive model is formalized as Conditional Variational Autoencoder with contrastive optimization, which jointly learns underlying visual-tactile latent representations, a latent garment dynamics model, and future predictions of garment states. Two cost functions are explored: the visual cost, defined by garment corner positions, guarantees the gripper to move towards the corner, while the tactile cost, defined by garment edge poses, prevents the garment from falling from the gripper. The experimental results demonstrate the improvement of our contrastive visual-tactile model predictive control over single sensing modality and baseline model learning techniques. The proposed method enables a robot to unfold back-opening hospital gowns and perform upper-body dressing.
Goubard C, Demiris Y, 2023, Cooking up trust: eye gaze and posture for trust-aware action selection in human-robot collaboration, TAS '23: First International Symposium on Trustworthy Autonomous Systems, Publisher: ACM, Pages: 1-5
Lingg N, Demiris Y, 2023, Building Trust in Assistive Robotics: Insights from a Real-World Mobile Navigation Experiment
Assistive robotics can improve the lives of people with mobility impairments, but widespread acceptance depends on understanding how trust forms during human-robot interaction (HRI). This study reports on a real-world mobile navigation experiment involving 27 participants who used an autonomous wheelchair to deliver packages to predetermined locations. The performance of the wheelchair was manipulated to create good and bad performance conditions. The participants' trust in the robot was measured using established trust questionnaires before and after the interaction. The results indicate that exposure to the robot and its performance significantly impact participants' self-reported levels of trust in the robot. Specifically, scores on trust factors relating to the robot taken after the experiment were significantly higher than those taken before, suggesting that exposure to the robot is an essential element of the trust-building process. The study also found that poor robot performance during the interaction negatively affected participants' perception of the robot's behaviour and their attitudes towards it. The study concludes with implications for the design and development of autonomous assistive devices, highlighting the importance of potential users' exposure to assistive robots for building trust in the system and the negative impact of bad robot performance on people's attitudes towards the technology.
McKenna PE, Romeo M, Pimentel J, et al., 2023, Theory of Mind and Trust in Human-Robot Navigation
In human-robot interaction, trust is affected by human, robot, and environmental factors. In the proposed research, we consider each of these factors, by focusing on the contribution of robot theory of mind (ToM), human visual perspective taking (a concept related to ToM), and environmental complexity, to the development and maintenance of human-robot trust. To do so, our experiment combines a psychological assessment of visual perspective taking (the Director Task), with a trust-based robot navigation task. Using the AREA model of Responsible Research and Innovation (RRI), we also highlight the implications of our experiment, in the context of trustworthy robotics and human-robot collaboration. We round off the article with a theoretical and research development synopses related to robot ToM and trust, and future work to be conducted in this area.
Kotsovolis S, Demiris Y, 2023, Bi-manual manipulation of multi-component garments towards robot-assisted dressing, 2023 IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE
In this paper, we propose a strategy for robot-assisted dressing with multi-component garments, such as gloves. Most studies in robot-assisted dressing usually experiment with single-component garments, such as sleeves, while multi-component tasks are often approached as sequential single-component problems. In dressing scenarios with more complex garments, robots should estimate the alignment of the human body to the manipulated garments, and revise their dressing strategy. In this paper, we focus on a glove dressing scenario and propose a decision process for selecting dressing action primitives on the different components of the garment, based on a hierarchical representation of the task and a set of environmental conditions. To complement this process, we propose a set of bi-manual control strategies, based on hybrid position, visual, and force feedback, in order to execute the dressing action primitives with the deformable object. The experimental results validate our method, enabling the Baxter robot to dress a mannequin's hand with a gardening glove.
Candela E, Doustaly O, Parada L, et al., 2023, Risk-aware controller for autonomous vehicles using model-based collision prediction and reinforcement learning, Artificial Intelligence, Vol: 320, ISSN: 0004-3702
Autonomous Vehicles (AVs) have the potential to save millions of lives and increase the efficiency of transportation services. However, the successful deployment of AVs requires tackling multiple challenges related to modeling and certifying safety. State-of-the-art decision-making methods usually rely on end-to-end learning or imitation learning approaches, which still pose significant safety risks. Hence the necessity of risk-aware AVs that can better predict and handle dangerous situations. Furthermore, current approaches tend to lack explainability due to their reliance on end-to-end Deep Learning, where significant causal relationships are not guaranteed to be learned from data. This paper introduces a novel risk-aware framework for training AV agents using a bespoke collision prediction model and Reinforcement Learning (RL). The collision prediction model is based on Gaussian Processes and vehicle dynamics, and is used to generate the RL state vector. Using an explicit risk model increases the post-hoc explainability of the AV agent, which is vital for reaching and certifying the high safety levels required for AVs and other safety-sensitive applications. Experimental results obtained with a simulator and state-of-the-art RL algorithms show that the risk-aware RL framework decreases average collision rates by 15%, makes AVs more robust to sudden harsh braking situations, and achieves better performance in both safety and speed when compared to a standard rule-based method (the Intelligent Driver Model). Moreover, the proposed collision prediction model outperforms other models in the literature.
Zhang X, Demiris Y, 2023, Visible and Infrared Image Fusion using Deep Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence
Bin Razali MH, 2023, Action-conditioned generation of bimanual object manipulation sequences, The 37th AAAI Conference on Artificial Intelligence (AAAI 2023), Publisher: AAAI, Pages: 2146-2154, ISSN: 2159-5399
The generation of bimanual object manipulation sequencesgiven a semantic action label has broad applications in collaborative robots or augmented reality. This relatively new problem differs from existing works that generate whole-body motions without any object interaction as it now requires the model to additionally learn the spatio-temporal relationship that exists between the human joints and object motion given said label. To tackle this task, we leverage the varying degree each muscle or joint is involved during object manipulation. For instance, the wrists act as the prime movers for the objects while the finger joints are angled to provide a firm grip. The remaining body joints are the least involved in that they are positioned as naturally and comfortably as possible. We thus design an architecture that comprises 3 main components: (i) a graph recurrent network that generates the wrist and object motion, (ii) an attention-based recurrent network that estimates the required finger joint angles given the graph configuration, and (iii) a recurrent network that reconstructs the body pose given the locations of the wrist. We evaluate our approach on the KIT Motion Capture and KIT RGBD Bi-manual Manipulation datasets and show improvements over a simplified approach that treats the entire body as a singleentity, and existing whole-body-only methods.
Zhang X, Angeloudis P, Demiris Y, 2023, Dual-branch Spatio-Temporal Graph Neural Networks for Pedestrian Trajectory Prediction, Pattern Recognition, ISSN: 0031-3203
Ren R, Rajesh MG, Sanchez-Riera J, et al., 2023, Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision, Pages: 147-153
Automatically detecting graspable regions from a single depth image is a key ingredient in cloth manipulation. The large variability of cloth deformations has motivated most of the current approaches to focus on identifying specific grasping points rather than semantic parts, as the appearance and depth variations of local regions are smaller and easier to model than the larger ones. However, tasks like cloth folding or assisted dressing require recognizing larger segments, such as semantic edges that carry more information than points. We thus first tackle the problem of fine-grained region detection in deformed clothes using only a depth image. We implement an approach for T-shirts, and define up to 6 semantic regions of varying extent, including edges on the neckline, sleeve cuffs, and hem, plus top and bottom grasping points. We introduce a U-Net based network to segment and label these parts. Our second contribution is concerned with the level of supervision required to train the proposed network. While most approaches learn to detect grasping points by combining real and synthetic annotations, in this work we propose a multilayered Domain Adaptation strategy that does not use any real annotations. We thoroughly evaluate our approach on real depth images of a T-shirt annotated with fine-grained labels, and show that training our network only with synthetic labels and our proposed DA approach yields results competitive with real data supervision.
Ovur SE, Demiris Y, 2023, Naturalistic Robot-to-Human Bimanual Handover in Complex Environments Through Multi-Sensor Fusion, IEEE Transactions on Automation Science and Engineering, ISSN: 1545-5955
Robot-human object handover has been extensively studied in recent years for a wide range of applications. However, it is still far from being as natural as human-human handovers, largely due to the robots’ limited sensing capabilities. Previous approaches in the literature typically simplify the handover scenarios, including one or more of (a) conducting handovers at fixed locations, (b) not adapting to human preferences, or (c) only focusing on single-arm handover with small objects due to the sensor occlusions caused by large objects. To advance the state of the art toward a human-human level of handover fluency, this paper investigates a bimanual handover scenario in a naturalistic, complex setup. Specifically, we target robot-to-human box transfer while the human partner is on a ladder, and ensure that the object is adaptively delivered based on human preferences. To address the occlusion problem that arises in a complex environment, we develop an onboard multi-sensor perception system for the bimanual robot, introduce a measurement confidence estimation technique, and propose an occlusion-resilient multi-sensor fusion technique by positioning visual perception sensors in distinct locations on the robot with different fields of view. In addition, we establish a Cartesian space controller with a quaternion approach and a leader-follower control structure for compliant motion. Four distinct experiments are conducted, covering different human preferences (such as the box delivered above or below the hands) and significant handover location changes once the process has begun. For validation, the proposed multi-sensor fusion technique was compared to a single-sensor approach for both top and bottom sensors separately, and to simple averaging of both sensors. 30 repetitions were performed for each experiment (four experiments, four methods), the equivalent of 480 handover repetitions in total. Multi-sensor fusion approach achieved a handover success rate a
Quesada RC, Demiris Y, 2023, Design and Evaluation of an Augmented Reality Head-Mounted Display User Interface for Controlling Legged Manipulators, Pages: 11950-11956, ISSN: 1050-4729
Designing an intuitive User Interface (UI) for controlling assistive robots remains challenging. Most existing UIs leverage traditional control interfaces such as joysticks, hand-held controllers, and 2D UIs. Thus, users have limited availability to use their hands for other tasks. Furthermore, although there is extensive research regarding legged manipulators, comparatively little is on their UIs. Towards extending the state-of-art in this domain, we provide a user study comparing an Augmented Reality (AR) Head-Mounted Display (HMD) UI we developed for controlling a legged manipulator against off-the-shelf control methods for such robots. We made this comparison baseline across multiple factors relevant to a successful interaction. The results from our user study (N=17) show that although the AR UI increases immersion, off-the-shelf control methods outperformed the AR UI in terms of time performance and cognitive workload. Nonetheless, a follow-up pilot study incorporating the lessons learned shows that AR UIs can outpace hand-held-based control methods and reduce the cognitive requirements when designers include hands-free interactions and cognitive offloading principles into the UI.
Zolotas M, Demiris Y, 2022, Disentangled sequence clustering for human intention inference, IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE, Pages: 9814-9820, ISSN: 2153-0866
Equipping robots with the ability to infer human intent is a vital precondition for effective collaboration. Most computational approaches towards this objective derive a probability distribution of “intent” conditioned on the robot’s perceived state. However, these approaches typically assumetask-specific labels of human intent are known a priori. To overcome this constraint, we propose the Disentangled Sequence Clustering Variational Autoencoder (DiSCVAE), a clustering framework capable of learning such a distribution of intent in an unsupervised manner. The proposed framework leverages recent advances in unsupervised learning to disentangle latentrepresentations of sequence data, separating time-varying local features from time-invariant global attributes. As a novel extension, the DiSCVAE also infers a discrete variable to form a latent mixture model and thus enable clustering over these global sequence concepts, e.g. high-level intentions. We evaluate the DiSCVAE on a real-world human-robot interaction datasetcollected using a robotic wheelchair. Our findings reveal that the inferred discrete variable coincides with human intent, holding promise for collaborative settings, such as shared control.
Chacon Quesada R, Demiris Y, 2022, Holo-SpoK: Affordance-aware augmented reality control of legged manipulators, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 856-862
Although there is extensive research regarding legged manipulators, comparatively little focuses on their User Interfaces (UIs). Towards extending the state-of-art in this domain, in this work, we integrate a Boston Dynamics(BD) Spot® with a light-weight 7 DoF Kinova® robot arm and a Robotiq® 2F-85 gripper into a legged manipulator. Furthermore, we jointly control the robotic platform using an affordance-aware Augmented Reality (AR) Head-Mounted Display (HMD) UI developed for the Microsoft HoloLens 2. We named the combined platform Holo-SpoK. Moreover, we explain how this manipulator colocalises with the HoloLens 2 for its control through AR. In addition, we present the details of our algorithms for autonomously detecting grasp-ability affordances and for the refinement of the positions obtainedvia vision-based colocalisation. We validate the suitability of our proposed methods with multiple navigation and manipulation experiments. To the best of our knowledge, this is the first demonstration of an AR HMD UI for controlling legged manipulators.
Amadori PV, Fischer T, Wang R, et al., 2022, Predicting secondary task performance: a directly actionable metric for cognitive overload detection, IEEE Transactions on Cognitive and Developmental Systems, Vol: 14, Pages: 1474-1485, ISSN: 2379-8920
In this paper, we address cognitive overload detection from unobtrusive physiological signals for users in dual-tasking scenarios. Anticipating cognitive overload is a pivotal challenge in interactive cognitive systems and could lead to safer shared-control between users and assistance systems. Our framework builds on the assumption that decision mistakes on the cognitive secondary task of dual-tasking users correspond to cognitive overload events, wherein the cognitive resources required to perform the task exceed the ones available to the users. We propose DecNet, an end-to-end sequence-to-sequence deep learning model that infers in real-time the likelihood of user mistakes on the secondary task, i.e., the practical impact of cognitive overload, from eye-gaze and head-pose data. We train and test DecNet on a dataset collected in a simulated driving setup from a cohort of 20 users on two dual-tasking decision-making scenarios, with either visual or auditory decision stimuli. DecNet anticipates cognitive overload events in both scenarios and can perform in time-constrained scenarios, anticipating cognitive overload events up to 2s before they occur. We show that DecNet’s performance gap between audio and visual scenarios is consistent with user perceived difficulty. This suggests that single modality stimulation induces higher cognitive load on users, hindering their decision-making abilities.
Dragostinov Y, Harðardóttir D, McKenna PE, et al., 2022, Preliminary psychometric scale development using the mixed methods Delphi technique, Methods in Psychology, Vol: 7
This study implemented a Delphi Method; a systematic technique which relies on a panel of experts to achieve consensus, to evaluate which questionnaire items would be the most relevant for developing a new Propensity to Trust scale. Following an initial research team moderation phase, two surveys were administered to academic lecturers, professors and Ph.D. candidates specialising in the fields of either individual differences, human-robot interaction, or occupational psychology. Results from 28 experts produced 33 final questionnaire items that were deemed relevant for evaluating trust. We discuss the importance of content validity when implementing scales, while emphasising the need for more documented scale development processes in psychology. Furthermore, we propose that the Delphi technique could be utilised as an effective and economical method for achieving content validity, while also providing greater scale creation transparency.
Nunes UM, Demiris Y, 2022, Robust Event-Based Vision Model Estimation by Dispersion Minimisation, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Vol: 44, Pages: 9561-9573, ISSN: 0162-8828
- Author Web Link
- Cite
- Citations: 3
Zhang X, Angeloudis P, Demiris Y, 2022, ST CrossingPose: a spatial-temporal graph convolutional network for skeleton-based pedestrian crossing intention prediction, IEEE Transactions on Intelligent Transportation Systems, Vol: 23, Pages: 20773-20782, ISSN: 1524-9050
Pedestrian crossing intention prediction is crucial for the safety of pedestrians in the context of both autonomous and conventional vehicles and has attracted widespread interest recently. Various methods have been proposed to perform pedestrian crossing intention prediction, among which the skeleton-based methods have been very popular in recent years. However, most existing studies utilize manually designed features to handle skeleton data, limiting the performance of these methods. To solve this issue, we propose to predict pedestrian crossing intention based on spatial-temporal graph convolutional networks using skeleton data (ST CrossingPose). The proposed method can learn both spatial and temporal patterns from skeleton data, thus having a good feature representation ability. Extensive experiments on a public dataset demonstrate that the proposed method achieves very competitive performance in predicting crossing intention while maintaining a fast inference speed. We also analyze the effect of several factors, e.g., size of pedestrians, time to event, and occlusion, on the proposed method.
Zhang X, Feng Y, Angeloudis P, et al., 2022, Monocular visual traffic surveillance: a review, IEEE Transactions on Intelligent Transportation Systems, Vol: 23, Pages: 14148-14165, ISSN: 1524-9050
To facilitate the monitoring and management of modern transportation systems, monocular visual traffic surveillance systems have been widely adopted for speed measurement, accident detection, and accident prediction. Thanks to the recent innovations in computer vision and deep learning research, the performance of visual traffic surveillance systems has been significantly improved. However, despite this success, there is a lack of survey papers that systematically review these new methods. Therefore, we conduct a systematic review of relevant studies to fill this gap and provide guidance to future studies. This paper is structured along the visual information processing pipeline that includes object detection, object tracking, and camera calibration. Moreover, we also include important applications of visual traffic surveillance systems, such as speed measurement, behavior learning, accident detection and prediction. Finally, future research directions of visual traffic surveillance systems are outlined.
Al-Hindawi A, Vizcaychipi M, Demiris Y, 2022, Faster, better blink detection through curriculum learning by augmentation, ETRA '22: 2022 Symposium on Eye Tracking Research and Applications, Publisher: ACM, Pages: 1-7
Blinking is a useful biological signal that can gate gaze regression models to avoid the use of incorrect data in downstream tasks. Existing datasets are imbalanced both in frequency of class but also in intra-class difficulty which we demonstrate is a barrier for curriculum learning. We thus propose a novel curriculum augmentation scheme that aims to address frequency and difficulty imbalances implicitly which are are terming Curriculum Learning by Augmentation (CLbA).Using Curriculum Learning by Augmentation (CLbA), we achieve a state-of-the-art performance of mean Average Precision (mAP) 0.971 using ResNet-18 up from the previous state-of-the-art of mean Average Precision (mAP) of 0.757 using DenseNet-121 whilst outcompeting Curriculum Learning by Bootstrapping (CLbB) by a significant margin with improved calibration. This new training scheme thus allows the use of smaller and more performant Convolutional Neural Network (CNN) backbones fulfilling Nyquist criteria to achieve a sampling frequency of 102.3Hz. This paves the way for inference of blinking in real-time applications.
Al-Hindawi A, Vizcaychipi MP, Demiris Y, 2022, What is the patient looking at? Robust gaze-scene intersection under free-viewing conditions, 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 2430-2434, ISSN: 1520-6149
Locating the user’s gaze in the scene, also known as Point of Regard (PoR) estimation, following gaze regression is important for many downstream tasks. Current techniques either require the user to wear and calibrate instruments, require significant pre-processing of the scene information, or place restrictions on user’s head movements.We propose a geometrically inspired algorithm that, despite its simplicity, provides high accuracy and O(J) performance under a variety of challenging situations including sparse depth maps, high noise, and high dynamic parallax between the user and the scene camera. We demonstrate the utility of the proposed algorithm in regressing the PoR from scenes captured in the Intensive Care Unit (ICU) at Chelsea & Westminster Hospital NHS Foundation Trust a .
Bin Razali MH, Demiris Y, 2022, Using a single input to forecast human action keystates in everyday pick and place actions, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 3488-3492
We define action keystates as the start or end of an actionthat contains information such as the human pose and time.Existing methods that forecast the human pose use recurrentnetworks that input and output a sequence of poses. In this pa-per, we present a method tailored for everyday pick and placeactions where the object of interest is known. In contrast toexisting methods, ours uses an input from a single timestep todirectly forecast (i) the key pose the instant the pick or placeaction is performed and (ii) the time it takes to get to the pre-dicted key pose. Experimental results show that our methodoutperforms the state-of-the-art for key pose forecasting andis comparable for time forecasting while running at least anorder of magnitude faster. Further ablative studies reveal thesignificance of the object of interest in enabling the total num-ber of parameters across all existing methods to be reduced byat least 90% without any degradation in performance.
Zhang F, Demiris Y, 2022, Learning garment manipulation policies toward robot-assisted dressing., Science Robotics, Vol: 7, Pages: eabm6010-eabm6010, ISSN: 2470-9476
Assistive robots have the potential to support people with disabilities in a variety of activities of daily living, such as dressing. People who have completely lost their upper limb movement functionality may benefit from robot-assisted dressing, which involves complex deformable garment manipulation. Here, we report a dressing pipeline intended for these people and experimentally validate it on a medical training manikin. The pipeline is composed of the robot grasping a hospital gown hung on a rail, fully unfolding the gown, navigating around a bed, and lifting up the user's arms in sequence to finally dress the user. To automate this pipeline, we address two fundamental challenges: first, learning manipulation policies to bring the garment from an uncertain state into a configuration that facilitates robust dressing; second, transferring the deformable object manipulation policies learned in simulation to real world to leverage cost-effective data generation. We tackle the first challenge by proposing an active pre-grasp manipulation approach that learns to isolate the garment grasping area before grasping. The approach combines prehensile and nonprehensile actions and thus alleviates grasping-only behavioral uncertainties. For the second challenge, we bridge the sim-to-real gap of deformable object policy transfer by approximating the simulator to real-world garment physics. A contrastive neural network is introduced to compare pairs of real and simulated garment observations, measure their physical similarity, and account for simulator parameters inaccuracies. The proposed method enables a dual-arm robot to put back-opening hospital gowns onto a medical manikin with a success rate of more than 90%.
Taniguchi T, Nagai T, Shimoda S, et al., 2022, Special issue on symbol emergence in robotics and cognitive systems (II), ADVANCED ROBOTICS, Vol: 36, Pages: 217-218, ISSN: 0169-1864
Kaptein F, Kiefer B, Cully A, et al., 2022, A cloud-based robot system for long-term interaction: principles, implementation, lessons learned, ACM Transactions on Human-Robot Interaction, Vol: 11, ISSN: 2573-9522
Making the transition to long-term interaction with social-robot systems has been identified as one of the main challenges in human-robot interaction. This article identifies four design principles to address this challenge and applies them in a real-world implementation: cloud-based robot control, a modular design, one common knowledge base for all applications, and hybrid artificial intelligence for decision making and reasoning. The control architecture for this robot includes a common Knowledge-base (ontologies), Data-base, “Hybrid Artificial Brain” (dialogue manager, action selection and explainable AI), Activities Centre (Timeline, Quiz, Break and Sort, Memory, Tip of the Day, ), Embodied Conversational Agent (ECA, i.e., robot and avatar), and Dashboards (for authoring and monitoring the interaction). Further, the ECA is integrated with an expandable set of (mobile) health applications. The resulting system is a Personal Assistant for a healthy Lifestyle (PAL), which supports diabetic children with self-management and educates them on health-related issues (48 children, aged 6–14, recruited via hospitals in the Netherlands and in Italy). It is capable of autonomous interaction “in the wild” for prolonged periods of time without the need for a “Wizard-of-Oz” (up until 6 months online). PAL is an exemplary system that provides personalised, stable and diverse, long-term human-robot interaction.
Jang Y, Demiris Y, 2022, Message passing framework for vision prediction stability in human robot interaction, IEEE International Conference on Robotics and Automation 2022, Publisher: IEEE, ISSN: 2152-4092
In Human Robot Interaction (HRI) scenarios, robot systems would benefit from an understanding of the user's state, actions and their effects on the environments to enable better interactions. While there are specialised vision algorithms for different perceptual channels, such as objects, scenes, human pose, and human actions, it is worth considering how their interaction can help improve each other's output. In computer vision, individual prediction modules for these perceptual channels frequently produce noisy outputs due to the limited datasets used for training and the compartmentalisation of the perceptual channels, often resulting in noisy or unstable prediction outcomes. To stabilise vision prediction results in HRI, this paper presents a novel message passing framework that uses the memory of individual modules to correct each other's outputs. The proposed framework is designed utilising common-sense rules of physics (such as the law of gravity) to reduce noise while introducing a pipeline that helps to effectively improve the output of each other's modules. The proposed framework aims to analyse primitive human activities such as grasping an object in a video captured from the perspective of a robot. Experimental results show that the proposed framework significantly reduces the output noise of individual modules compared to the case of running independently. This pipeline can be used to measure human reactions when interacting with a robot in various HRI scenarios.
Bin Razali MH, Demiris Y, 2022, Using eye-gaze to forecast human pose in everyday pick and place actions, IEEE International Conference on Robotics and Automation
Collaborative robots that operate alongside hu-mans require the ability to understand their intent and forecasttheir pose. Among the various indicators of intent, the eyegaze is particularly important as it signals action towards thegazed object. By observing a person’s gaze, one can effectivelypredict the object of interest and subsequently, forecast theperson’s pose. We leverage this and present a method thatforecasts the human pose using gaze information for everydaypick and place actions in a home environment. Our method firstattends to fixations to locate the coordinates of the object ofinterest before inputting said coordinates to a pose forecastingnetwork. Experiments on the MoGaze dataset show that ourgaze network lowers the errors of existing pose forecastingmethods and that incorporating prior in the form of textualinstructions further lowers the errors by a significant amount.Furthermore, the use of eye gaze now allows a simple multilayerperceptron network to directly forecast the keypose.
Quesada RC, Demiris Y, 2022, Proactive robot assistance: affordance-aware augmented reality user interfaces, IEEE Robotics and Automation magazine, Vol: 29, ISSN: 1070-9932
Assistive robots have the potential to increase the autonomy and quality of life of people with disabilities [1] . Their applications include rehabilitation robots, smart wheelchairs, companion robots, mobile manipulators, and educational robots [2] . However, designing an intuitive user interface (UI) for the control of assistive robots remains a challenge, as most UIs leverage traditional control interfaces, such as joysticks and keyboards, which might be challenging and even impossible for some users. Augmented reality (AR) UIs introduce more natural interactions between people and assistive robots, potentially reaching a more diverse user base.
Girbes-Juan V, Schettino V, Gracia L, et al., 2022, Combining haptics and inertial motion capture to enhance remote control of a dual-arm robot, JOURNAL ON MULTIMODAL USER INTERFACES, Vol: 16, Pages: 219-238, ISSN: 1783-7677
- Author Web Link
- Cite
- Citations: 3
Taniguchi T, Nagai T, Shimoda S, et al., 2022, Special issue on Symbol Emergence in Robotics and Cognitive Systems (I) PREFACE, ADVANCED ROBOTICS, Vol: 36, Pages: 1-2, ISSN: 0169-1864
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.