Publications

Taniguchi T, Nagai T, Shimoda S, Cangelosi A, Demiris Y, Matsuo Y, Doya K, Ogata T, Jamone L, Nagai Y, Ugur E, Mochihashi D, Unno Y, Okanoya K, Hashimoto Tet al., 2022, Special issue on symbol emergence in robotics and cognitive systems (II), ADVANCED ROBOTICS, Vol: 36, Pages: 217-218, ISSN: 0169-1864

Journal article

Kaptein F, Kiefer B, Cully A, Celiktutan O, Bierman B, Rijgersberg-peters R, Broekens J, Van Vught W, Van Bekkum M, Demiris Y, Neerincx MAet al., 2022, A cloud-based robot system for long-term interaction: principles, implementation, lessons learned, ACM Transactions on Human-Robot Interaction, Vol: 11, ISSN: 2573-9522

Making the transition to long-term interaction with social-robot systems has been identified as one of the main challenges in human-robot interaction. This article identifies four design principles to address this challenge and applies them in a real-world implementation: cloud-based robot control, a modular design, one common knowledge base for all applications, and hybrid artificial intelligence for decision making and reasoning. The control architecture for this robot includes a common Knowledge-base (ontologies), Data-base, “Hybrid Artificial Brain” (dialogue manager, action selection and explainable AI), Activities Centre (Timeline, Quiz, Break and Sort, Memory, Tip of the Day, ), Embodied Conversational Agent (ECA, i.e., robot and avatar), and Dashboards (for authoring and monitoring the interaction). Further, the ECA is integrated with an expandable set of (mobile) health applications. The resulting system is a Personal Assistant for a healthy Lifestyle (PAL), which supports diabetic children with self-management and educates them on health-related issues (48 children, aged 6–14, recruited via hospitals in the Netherlands and in Italy). It is capable of autonomous interaction “in the wild” for prolonged periods of time without the need for a “Wizard-of-Oz” (up until 6 months online). PAL is an exemplary system that provides personalised, stable and diverse, long-term human-robot interaction.

Journal article

Bin Razali MH, Demiris Y, 2022, Using eye-gaze to forecast human pose in everyday pick and place actions, IEEE International Conference on Robotics and Automation

Collaborative robots that operate alongside hu-mans require the ability to understand their intent and forecasttheir pose. Among the various indicators of intent, the eyegaze is particularly important as it signals action towards thegazed object. By observing a person’s gaze, one can effectivelypredict the object of interest and subsequently, forecast theperson’s pose. We leverage this and present a method thatforecasts the human pose using gaze information for everydaypick and place actions in a home environment. Our method firstattends to fixations to locate the coordinates of the object ofinterest before inputting said coordinates to a pose forecastingnetwork. Experiments on the MoGaze dataset show that ourgaze network lowers the errors of existing pose forecastingmethods and that incorporating prior in the form of textualinstructions further lowers the errors by a significant amount.Furthermore, the use of eye gaze now allows a simple multilayerperceptron network to directly forecast the keypose.

Conference paper

Jang Y, Demiris Y, 2022, Message passing framework for vision prediction stability in human robot interaction, IEEE International Conference on Robotics and Automation 2022, Publisher: IEEE, ISSN: 2152-4092

In Human Robot Interaction (HRI) scenarios, robot systems would benefit from an understanding of the user's state, actions and their effects on the environments to enable better interactions. While there are specialised vision algorithms for different perceptual channels, such as objects, scenes, human pose, and human actions, it is worth considering how their interaction can help improve each other's output. In computer vision, individual prediction modules for these perceptual channels frequently produce noisy outputs due to the limited datasets used for training and the compartmentalisation of the perceptual channels, often resulting in noisy or unstable prediction outcomes. To stabilise vision prediction results in HRI, this paper presents a novel message passing framework that uses the memory of individual modules to correct each other's outputs. The proposed framework is designed utilising common-sense rules of physics (such as the law of gravity) to reduce noise while introducing a pipeline that helps to effectively improve the output of each other's modules. The proposed framework aims to analyse primitive human activities such as grasping an object in a video captured from the perspective of a robot. Experimental results show that the proposed framework significantly reduces the output noise of individual modules compared to the case of running independently. This pipeline can be used to measure human reactions when interacting with a robot in various HRI scenarios.

Conference paper

Quesada RC, Demiris Y, 2022, Proactive robot assistance: affordance-aware augmented reality user interfaces, IEEE Robotics and Automation magazine, Vol: 29, ISSN: 1070-9932

Assistive robots have the potential to increase the autonomy and quality of life of people with disabilities [1] . Their applications include rehabilitation robots, smart wheelchairs, companion robots, mobile manipulators, and educational robots [2] . However, designing an intuitive user interface (UI) for the control of assistive robots remains a challenge, as most UIs leverage traditional control interfaces, such as joysticks and keyboards, which might be challenging and even impossible for some users. Augmented reality (AR) UIs introduce more natural interactions between people and assistive robots, potentially reaching a more diverse user base.

Journal article

Taniguchi T, Nagai T, Shimoda S, Cangelosi A, Demiris Y, Matsuo Y, Doya K, Ogata T, Jamone L, Nagai Y, Ugur E, Mochihashi D, Unno Y, Okanoya K, Hashimoto Tet al., 2022, Special issue on Symbol Emergence in Robotics and Cognitive Systems (I) PREFACE, ADVANCED ROBOTICS, Vol: 36, Pages: 1-2, ISSN: 0169-1864

Journal article

Nunes UM, Demiris Y, 2022, Kinematic Structure Estimation of Arbitrary Articulated Rigid Objects for Event Cameras, Pages: 508-514, ISSN: 1050-4729

We propose a novel method that estimates the Kinematic Structure (KS) of arbitrary articulated rigid objects from event-based data. Event cameras are emerging sensors that asynchronously report brightness changes with a time resolution of microseconds, making them suitable candidates for motion-related perception. By assuming that an articulated rigid object is composed of body parts whose shape can be approximately described by a Gaussian distribution, we jointly segment the different parts by combining an adapted Bayesian inference approach and incremental event-based motion estimation. The respective KS is then generated based on the segmented parts and their respective biharmonic distance, which is estimated by building an affinity matrix of points sampled from the estimated Gaussian distributions. The method outperforms frame-based methods in sequences obtained by simulating events from video sequences and achieves a solid performance on new high-speed motions sequences, which frame-based KS estimation methods can not handle.

Abstract
Cite
Citations: 1

Conference paper

Candela E, Parada L, Marques L, Georgescu T-A, Demiris Y, Angeloudis Pet al., 2022, Transferring Multi-Agent Reinforcement Learning Policies for Autonomous Driving using Sim-to-Real, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 8814-8820, ISSN: 2153-0858

Author Web Link
Cite
Citations: 3

Conference paper

McKenna PE, Romeo M, Pimentel J, Diab M, Moujahid M, Hastie H, Demiris Yet al., 2022, Theory of Mind and Trust in Human-Robot Navigation, 1st International Symposium on Trustworthy Autonomous Systems (TAS), Publisher: ASSOC COMPUTING MACHINERY

Conference paper

Nunes UM, Demiris Y, 2022, Kinematic Structure Estimation of Arbitrary Articulated Rigid Objects for Event Cameras, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE

Conference paper

Lingg N, Demiris Y, 2022, Building Trust in Assistive Robotics: Insights From a Real-World Mobile Navigation Experiment, 1st International Symposium on Trustworthy Autonomous Systems (TAS), Publisher: ASSOC COMPUTING MACHINERY

Conference paper

Shipman A, Mead D, Feng Y, Escribano J, Angeloudis P, Demiris Yet al., 2022, Novel trajectory prediction algorithm using a full dataset: comparison and ablation studies, IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Publisher: IEEE, Pages: 2401-2406, ISSN: 2153-0009

Conference paper

Razali H, Demiris Y, 2021, Multitask variational autoencoding of human-to-human object handover, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 7315-7320, ISSN: 2153-0858

Assistive robots that operate alongside humans require the ability to understand and replicate human behaviours during a handover. A handover is defined as a joint action between two participants in which a giver hands an object over to the receiver. In this paper, we present a method for learning human-to-human handovers observed from motion capture data. Given the giver and receiver pose from a single timestep, and the object label in the form of a word embedding, our Multitask Variational Autoencoder jointly forecasts their pose as well as the orientation of the object held by the giver at handover. Our method is in large contrast to existing works for human pose forecasting that employ deep autoregressive models requiring a sequence of inputs. Furthermore, our method is novel in that it learns both the human pose and object orientation in a joint manner. Experimental results on the publicly available Handover Orientation and Motion Capture Dataset show that our proposed method outperforms the autoregressive baselines for handover pose forecasting by approximately 20% while being on-par for object orientation prediction with a runtime that is 5x faster. a

Conference paper

Al-Hindawi A, Vizcaychipi MP, Demiris Y, 2021, Continuous non-invasive eye tracking in intensive care, 43rd Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society (IEEE EMBC), Publisher: IEEE, Pages: 1869-1873, ISSN: 1557-170X

Delirium, an acute confusional state, is a common occurrence in Intensive Care Units (ICUs). Patients who develop delirium have globally worse outcomes than those who do not and thus the diagnosis of delirium is of importance. Current diagnostic methods have several limitations leading to the suggestion of eye-tracking for its diagnosis through in-attention. To ascertain the requirements for an eye-tracking system in an adult ICU, measurements were carried out at Chelsea & Westminster Hospital NHS Foundation Trust. Clinical criteria guided empirical requirements of invasiveness and calibration methods while accuracy and precision were measured. A non-invasive system was then developed utilising a patient-facing RGB camera and a scene-facing RGBD camera. The system’s performance was measured in a replicated laboratory environment with healthy volunteers revealing an accuracy and precision that outperforms what is required while simultaneously being non-invasive and calibration-free The system was then deployed as part of CONfuSED, a clinical feasibility study where we report aggregated data from 5 patients as well as the acceptability of the system to bedside nursing staff. To the best of our knowledge, the system is the first eye-tracking systems to be deployed in an ICU for delirium monitoring.

Conference paper

Nunes UM, Demiris Y, 2021, Live demonstration: incremental motion estimation for event-based cameras by dispersion minimisation, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE COMPUTER SOC, Pages: 1322-1323, ISSN: 2160-7508

Live demonstration setup. (Left) The setup consists of a DAVIS346B event camera connected to a standard consumer laptop and undergoes some motion. (Right) The motion estimates are plotted in red and, for rotation-like motions, the angular velocities provided by the camera IMU are also plotted in blue. This plot exemplifies an event camera undergoing large rotational motions (up to ~ 1000 deg/s) around the (a) x-axis, (b) y-axis and (c) z-axis. Overall, the incremental motion estimation method follows the IMU measurements. Optionally, the resultant global optical flow can also be shown, as well as the corresponding generated events by accumulating them onto the image plane (bottom left corner).

Conference paper

Chacon-Quesada R, Demiris Y, 2021, Augmented reality eser interfaces for heterogeneous multirobot control, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 11439-11444, ISSN: 2153-0858

Recent advances in the design of head-mounted augmented reality (AR) interfaces for assistive human-robot interaction (HRI) have allowed untrained users to rapidly and fluently control single-robot platforms. In this paper, we investigate how such interfaces transfer onto multirobot architectures, as several assistive robotics applications need to be distributed among robots that are different both physically and in terms of software. As part of this investigation, we introduce a novel head-mounted AR interface for heterogeneous multirobot control. This interface generates and displays dynamic joint-affordance signifiers, i.e. signifiers that combine and show multiple actions from different robots that can be applied simultaneously to an object. We present a user study with 15 participants analysing the effects of our approach on their perceived fluency. Participants were given the task of filling-out a cup with water making use of a multirobot platform. Our results show a clear improvement in standard HRI fluency metrics when users applied dynamic joint-affordance signifiers, as opposed to a sequence of independent actions.

Conference paper

Amadori P, Fischer T, Demiris Y, 2021, HammerDrive: A task-aware driving visual attention model, IEEE Transactions on Intelligent Transportation Systems, Vol: 23, Pages: 5573-5585, ISSN: 1524-9050

We introduce HammerDrive, a novel architecture for task-aware visual attention prediction in driving. The proposed architecture is learnable from data and can reliably infer the current focus of attention of the driver in real-time, while only requiring limited and easy-to-access telemetry data from the vehicle. We build the proposed architecture on two core concepts: 1) driving can be modeled as a collection of sub-tasks (maneuvers), and 2) each sub-task affects the way a driver allocates visual attention resources, i.e., their eye gaze fixation. HammerDrive comprises two networks: a hierarchical monitoring network of forward-inverse model pairs for sub-task recognition and an ensemble network of task-dependent convolutional neural network modules for visual attention modeling. We assess the ability of HammerDrive to infer driver visual attention on data we collected from 20 experienced drivers in a virtual reality-based driving simulator experiment. We evaluate the accuracy of our monitoring network for sub-task recognition and show that it is an effective and light-weight network for reliable real-time tracking of driving maneuvers with above 90% accuracy. Our results show that HammerDrive outperforms a comparable state-of-the-art deep learning model for visual attention prediction on numerous metrics with ~13% improvement for both Kullback-Leibler divergence and similarity, and demonstrate that task-awareness is beneficial for driver visual attention prediction.

Journal article

Tian Y, Balntas V, Ng T, Barroso-Laguna A, Demiris Y, Mikolajczyk Ket al., 2021, D2D: Keypoint Extraction with Describe to Detect Approach, Pages: 223-240, ISSN: 0302-9743

In this paper, we present a novel approach that exploits the information within the descriptor space to propose keypoint locations. Detect then describe, or jointly detect and describe are two typical strategies for extracting local features. In contrast, we propose an approach that inverts this process by first describing and then detecting the keypoint locations. Describe-to-Detect (D2D) leverages successful descriptor models without the need for any additional training. Our method selects keypoints as salient locations with high information content which are defined by the descriptors rather than some independent operators. We perform experiments on multiple benchmarks including image matching, camera localisation, and 3D reconstruction. The results indicate that our method improves the matching performance of various descriptors and that it generalises across methods and tasks.

Abstract
Cite
Citations: 9

Conference paper

Girbes-Juan V, Schettino V, Demiris Y, Tornero Jet al., 2021, Haptic and Visual Feedback Assistance for Dual-Arm Robot Teleoperation in Surface Conditioning Tasks, IEEE TRANSACTIONS ON HAPTICS, Vol: 14, Pages: 44-56, ISSN: 1939-1412

Journal article

Buizza C, Demiris Y, 2021, Rotational Adjoint Methods for Learning-Free 3D Human Pose Estimation from IMU Data, 25th International Conference on Pattern Recognition (ICPR), Publisher: IEEE COMPUTER SOC, Pages: 7868-7875, ISSN: 1051-4651

Author Web Link
Cite
Citations: 1

Conference paper

Behrens JK, Nazarczuk M, Stepanova K, Hoffmann M, Demiris Y, Mikolajczyk Ket al., 2021, Embodied Reasoning for Discovering Object Properties via Manipulation, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 10139-10145, ISSN: 1050-4729

Conference paper

Candela E, Feng Y, Mead D, Demiris Y, Angeloudis Pet al., 2021, Fast Collision Prediction for Autonomous Vehicles using a Stochastic Dynamics Model, IEEE Intelligent Transportation Systems Conference (ITSC), Publisher: IEEE, Pages: 211-216, ISSN: 2153-0009

Author Web Link
Cite
Citations: 4

Conference paper

Amadori PV, Fischer T, Wang R, Demiris Yet al., 2020, Decision anticipation for driving assistance systems, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Publisher: IEEE, Pages: 1-7

Anticipating the correctness of imminent driver decisions is a crucial challenge in advanced driving assistance systems and has the potential to lead to more reliable and safer human-robot interactions. In this paper, we address the task of decision correctness prediction in a driver-in-the-loop simulated environment using unobtrusive physiological signals, namely, eye gaze and head pose. We introduce a sequence-to-sequence based deep learning model to infer the driver's likelihood of making correct/wrong decisions based on the corresponding cognitive state. We provide extensive experimental studies over multiple baseline classification models on an eye gaze pattern and head pose dataset collected from simulated driving. Our results show strong correlates between the physiological data and decision correctness, and that the proposed sequential model reliably predicts decision correctness from the driver with 80% precision and 72% recall. We also demonstrate that our sequential model performs well in scenarios where early anticipation of correctness is critical, with accurate predictions up to two seconds before a decision is performed.

Conference paper

Fischer T, Demiris Y, 2020, Computational modelling of embodied visual perspective-taking, IEEE Transactions on Cognitive and Developmental Systems, Vol: 12, Pages: 723-732, ISSN: 2379-8920

Humans are inherently social beings that benefit from their perceptional capability to embody another point of view, typically referred to as perspective-taking. Perspective-taking is an essential feature in our daily interactions and is pivotal for human development. However, much remains unknown about the precise mechanisms that underlie perspective-taking. Here we show that formalizing perspective-taking in a computational model can detail the embodied mechanisms employed by humans in perspective-taking. The model's main building block is a set of action primitives that are passed through a forward model. The model employs a process that selects a subset of action primitives to be passed through the forward model to reduce the response time. The model demonstrates results that mimic those captured by human data, including (i) response times differences caused by the angular disparity between the perspective-taker and the other agent, (ii) the impact of task-irrelevant body posture variations in perspective-taking, and (iii) differences in the perspective-taking strategy between individuals. Our results provide support for the hypothesis that perspective-taking is a mental simulation of the physical movements that are required to match another person's visual viewpoint. Furthermore, the model provides several testable predictions, including the prediction that forced early responses lead to an egocentric bias and that a selection process introduces dependencies between two consecutive trials. Our results indicate potential links between perspective-taking and other essential perceptional and cognitive mechanisms, such as active vision and autobiographical memories.

Journal article

Goncalves Nunes UM, Demiris Y, 2020, Entropy minimisation framework for event-based vision model estimation, 16th European Conference on Computer Vision 2020, Publisher: Springer, Pages: 161-176

We propose a novel Entropy Minimisation (EMin) frame-work for event-based vision model estimation. The framework extendsprevious event-based motion compensation algorithms to handle modelswhose outputs have arbitrary dimensions. The main motivation comesfrom estimating motion from events directly in 3D space (e.g.eventsaugmented with depth), without projecting them onto an image plane.This is achieved by modelling the event alignment according to candidateparameters and minimising the resultant dispersion. We provide a familyof suitable entropy loss functions and an efficient approximation whosecomplexity is only linear with the number of events (e.g.the complexitydoes not depend on the number of image pixels). The framework is eval-uated on several motion estimation problems, including optical flow androtational motion. As proof of concept, we also test our framework on6-DOF estimation by performing the optimisation directly in 3D space.

Conference paper

Wang R, Demiris Y, Ciliberto C, 2020, Structured prediction for conditional meta-learning, Publisher: arXiv

The goal of optimization-based meta-learning is to find a singleinitialization shared across a distribution of tasks to speed up the process oflearning new tasks. Conditional meta-learning seeks task-specificinitialization to better capture complex task distributions and improveperformance. However, many existing conditional methods are difficult togeneralize and lack theoretical guarantees. In this work, we propose a newperspective on conditional meta-learning via structured prediction. We derivetask-adaptive structured meta-learning (TASML), a principled framework thatyields task-specific objective functions by weighing meta-training data ontarget tasks. Our non-parametric approach is model-agnostic and can be combinedwith existing meta-learning methods to achieve conditioning. Empirically, weshow that TASML improves the performance of existing meta-learning models, andoutperforms the state-of-the-art on benchmark datasets.

Working paper

Zhang F, Demiris Y, 2020, Learning grasping points for garment manipulation in robot-assisted dressing, 2020 IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 9114-9120

Assistive robots have the potential to provide tremendous support for disabled and elderly people in their daily dressing activities. Recent studies on robot-assisted dressing usually simplify the setup of the initial robot configuration by manually attaching the garments on the robot end-effector and positioning them close to the user's arm. A fundamental challenge in automating such a process for robots is computing suitable grasping points on garments that facilitate robotic manipulation. In this paper, we address this problem by introducing a supervised deep neural network to locate a predefined grasping point on the garment, using depth images for their invariance to color and texture. To reduce the amount of real data required, which is costly to collect, we leverage the power of simulation to produce large amounts of labeled data. The network is jointly trained with synthetic datasets of depth images and a limited amount of real data. We introduce a robot-assisted dressing system that combines the grasping point prediction method, with a grasping and manipulation strategy which takes grasping orientation computation and robot-garment collision avoidance into account. The experimental results demonstrate that our method is capable of yielding accurate grasping point estimations. The proposed dressing system enables the Baxter robot to autonomously grasp a hospital gown hung on a rail, bring it close to the user and successfully dress the upper-body.

Conference paper

Gao Y, Chang HJ, Demiris Y, 2020, User modelling using multimodal information for personalised dressing assistance, IEEE Access, Vol: 8, Pages: 45700-45714, ISSN: 2169-3536

Journal article

Nunes UM, Demiris Y, 2020, Online unsupervised learning of the 3D kinematic structure of arbitrary rigid bodies, IEEE/CVF International Conference on Computer Vision (ICCV), Publisher: IEEE Computer Soc, Pages: 3808-3816, ISSN: 1550-5499

This work addresses the problem of 3D kinematic structure learning of arbitrary articulated rigid bodies from RGB-D data sequences. Typically, this problem is addressed by offline methods that process a batch of frames, assuming that complete point trajectories are available. However, this approach is not feasible when considering scenarios that require continuity and fluidity, for instance, human-robot interaction. In contrast, we propose to tackle this problem in an online unsupervised fashion, by recursively maintaining the metric distance of the scene's 3D structure, while achieving real-time performance. The influence of noise is mitigated by building a similarity measure based on a linear embedding representation and incorporating this representation into the original metric distance. The kinematic structure is then estimated based on a combination of implicit motion and spatial properties. The proposed approach achieves competitive performance both quantitatively and qualitatively in terms of estimation accuracy, even compared to offline methods.

Conference paper

Chacon-Quesada R, Demiris Y, 2020, Augmented reality controlled smart wheelchair using dynamic signifiers for affordance representation, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE

The design of augmented reality interfaces for people with mobility impairments is a novel area with great potential, as well as multiple outstanding research challenges. In this paper we present an augmented reality user interface for controlling a smart wheelchair with a head-mounted display to provide assistance for mobility restricted people. Our motivation is to reduce the cognitive requirements needed to control a smart wheelchair. A key element of our platform is the ability to control the smart wheelchair using the concepts of affordances and signifiers. In addition to the technical details of our platform, we present a baseline study by evaluating our platform through user-trials of able-bodied individuals and two different affordances: 1) Door Go Through and 2) People Approach. To present these affordances to the user, we evaluated fixed symbol based signifiers versus our novel dynamic signifiers in terms of ease to understand the suggested actions and its relation with the objects. Our results show a clear preference for dynamic signifiers. In addition, we show that the task load reported by participants is lower when controlling the smart wheelchair with our augmented reality user interface compared to using the joystick, which is consistent with their qualitative answers.

Conference paper

Professor Yiannis Demiris

Contact

Location

Summary