Imperial College London

Professor Yiannis Demiris

Faculty of EngineeringDepartment of Electrical and Electronic Engineering

Professor of Human-Centred Robotics, Head of ISN



+44 (0)20 7594 6300y.demiris Website




1011Electrical EngineeringSouth Kensington Campus





Publication Type

214 results found

Amadori P, Fischer T, Demiris Y, 2021, HammerDrive: a task-aware driving visual attention model, IEEE Transactions on Intelligent Transportation Systems, Pages: 1-13, ISSN: 1524-9050

We introduce HammerDrive, a novel architecture for task-aware visual attention prediction in driving. The proposed architecture is learnable from data and can reliably infer the current focus of attention of the driver in real-time, while only requiring limited and easy-to-access telemetry data from the vehicle. We build the proposed architecture on two core concepts: 1) driving can be modeled as a collection of sub-tasks (maneuvers), and 2) each sub-task affects the way a driver allocates visual attention resources, i.e., their eye gaze fixation. HammerDrive comprises two networks: a hierarchical monitoring network of forward-inverse model pairs for sub-task recognition and an ensemble network of task-dependent convolutional neural network modules for visual attention modeling. We assess the ability of HammerDrive to infer driver visual attention on data we collected from 20 experienced drivers in a virtual reality-based driving simulator experiment. We evaluate the accuracy of our monitoring network for sub-task recognition and show that it is an effective and light-weight network for reliable real-time tracking of driving maneuvers with above 90% accuracy. Our results show that HammerDrive outperforms a comparable state-of-the-art deep learning model for visual attention prediction on numerous metrics with ~13% improvement for both Kullback-Leibler divergence and similarity, and demonstrate that task-awareness is beneficial for driver visual attention prediction.

Journal article

Zolotas M, Demiris Y, 2021, Disentangled sequence clustering for human intention inference, Publisher: arXiv

Equipping robots with the ability to infer human intent is a vitalprecondition for effective collaboration. Most computational approaches towardsthis objective employ probabilistic reasoning to recover a distribution of"intent" conditioned on the robot's perceived sensory state. However, theseapproaches typically assume task-specific notions of human intent (e.g.labelled goals) are known a priori. To overcome this constraint, we propose theDisentangled Sequence Clustering Variational Autoencoder (DiSCVAE), aclustering framework that can be used to learn such a distribution of intent inan unsupervised manner. The DiSCVAE leverages recent advances in unsupervisedlearning to derive a disentangled latent representation of sequential data,separating time-varying local features from time-invariant global aspects.Though unlike previous frameworks for disentanglement, the proposed variantalso infers a discrete variable to form a latent mixture model and enableclustering of global sequence concepts, e.g. intentions from observed humanbehaviour. To evaluate the DiSCVAE, we first validate its capacity to discoverclasses from unlabelled sequences using video datasets of bouncing digits and2D animations. We then report results from a real-world human-robot interactionexperiment conducted on a robotic wheelchair. Our findings glean insights intohow the inferred discrete variable coincides with human intent and thus servesto improve assistance in collaborative settings, such as shared control.

Working paper

Girbes-Juan V, Schettino V, Demiris Y, Tornero Jet al., 2021, Haptic and Visual Feedback Assistance for Dual-Arm Robot Teleoperation in Surface Conditioning Tasks, IEEE TRANSACTIONS ON HAPTICS, Vol: 14, Pages: 44-56, ISSN: 1939-1412

Journal article

Amadori PV, Fischer T, Wang R, Demiris Yet al., 2020, Decision anticipation for driving assistance systems, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Publisher: IEEE, Pages: 1-7

Anticipating the correctness of imminent driver decisions is a crucial challenge in advanced driving assistance systems and has the potential to lead to more reliable and safer human-robot interactions. In this paper, we address the task of decision correctness prediction in a driver-in-the-loop simulated environment using unobtrusive physiological signals, namely, eye gaze and head pose. We introduce a sequence-to-sequence based deep learning model to infer the driver's likelihood of making correct/wrong decisions based on the corresponding cognitive state. We provide extensive experimental studies over multiple baseline classification models on an eye gaze pattern and head pose dataset collected from simulated driving. Our results show strong correlates between the physiological data and decision correctness, and that the proposed sequential model reliably predicts decision correctness from the driver with 80% precision and 72% recall. We also demonstrate that our sequential model performs well in scenarios where early anticipation of correctness is critical, with accurate predictions up to two seconds before a decision is performed.

Conference paper

Fischer T, Demiris Y, 2020, Computational modelling of embodied visual perspective-taking, IEEE Transactions on Cognitive and Developmental Systems, Vol: 12, Pages: 723-732, ISSN: 2379-8920

Humans are inherently social beings that benefit from their perceptional capability to embody another point of view, typically referred to as perspective-taking. Perspective-taking is an essential feature in our daily interactions and is pivotal for human development. However, much remains unknown about the precise mechanisms that underlie perspective-taking. Here we show that formalizing perspective-taking in a computational model can detail the embodied mechanisms employed by humans in perspective-taking. The model's main building block is a set of action primitives that are passed through a forward model. The model employs a process that selects a subset of action primitives to be passed through the forward model to reduce the response time. The model demonstrates results that mimic those captured by human data, including (i) response times differences caused by the angular disparity between the perspective-taker and the other agent, (ii) the impact of task-irrelevant body posture variations in perspective-taking, and (iii) differences in the perspective-taking strategy between individuals. Our results provide support for the hypothesis that perspective-taking is a mental simulation of the physical movements that are required to match another person's visual viewpoint. Furthermore, the model provides several testable predictions, including the prediction that forced early responses lead to an egocentric bias and that a selection process introduces dependencies between two consecutive trials. Our results indicate potential links between perspective-taking and other essential perceptional and cognitive mechanisms, such as active vision and autobiographical memories.

Journal article

Wang R, Demiris Y, Ciliberto C, 2020, Structured prediction for conditional meta-learning, Publisher: arXiv

The goal of optimization-based meta-learning is to find a singleinitialization shared across a distribution of tasks to speed up the process oflearning new tasks. Conditional meta-learning seeks task-specificinitialization to better capture complex task distributions and improveperformance. However, many existing conditional methods are difficult togeneralize and lack theoretical guarantees. In this work, we propose a newperspective on conditional meta-learning via structured prediction. We derivetask-adaptive structured meta-learning (TASML), a principled framework thatyields task-specific objective functions by weighing meta-training data ontarget tasks. Our non-parametric approach is model-agnostic and can be combinedwith existing meta-learning methods to achieve conditioning. Empirically, weshow that TASML improves the performance of existing meta-learning models, andoutperforms the state-of-the-art on benchmark datasets.

Working paper

Goncalves Nunes UM, Demiris Y, 2020, Entropy minimisation framework for event-based vision model estimation, 16th European Conference on Computer Vision 2020, Publisher: Springer

We propose a novel Entropy Minimisation (EMin) frame-work for event-based vision model estimation. The framework extendsprevious event-based motion compensation algorithms to handle modelswhose outputs have arbitrary dimensions. The main motivation comesfrom estimating motion from events directly in 3D space (e.g.eventsaugmented with depth), without projecting them onto an image plane.This is achieved by modelling the event alignment according to candidateparameters and minimising the resultant dispersion. We provide a familyof suitable entropy loss functions and an efficient approximation whosecomplexity is only linear with the number of events (e.g.the complexitydoes not depend on the number of image pixels). The framework is eval-uated on several motion estimation problems, including optical flow androtational motion. As proof of concept, we also test our framework on6-DOF estimation by performing the optimisation directly in 3D space.

Conference paper

Zhang F, Demiris Y, 2020, Learning grasping points for garment manipulation in robot-assisted dressing, 2020 IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 9114-9120

Assistive robots have the potential to provide tremendous support for disabled and elderly people in their daily dressing activities. Recent studies on robot-assisted dressing usually simplify the setup of the initial robot configuration by manually attaching the garments on the robot end-effector and positioning them close to the user's arm. A fundamental challenge in automating such a process for robots is computing suitable grasping points on garments that facilitate robotic manipulation. In this paper, we address this problem by introducing a supervised deep neural network to locate a predefined grasping point on the garment, using depth images for their invariance to color and texture. To reduce the amount of real data required, which is costly to collect, we leverage the power of simulation to produce large amounts of labeled data. The network is jointly trained with synthetic datasets of depth images and a limited amount of real data. We introduce a robot-assisted dressing system that combines the grasping point prediction method, with a grasping and manipulation strategy which takes grasping orientation computation and robot-garment collision avoidance into account. The experimental results demonstrate that our method is capable of yielding accurate grasping point estimations. The proposed dressing system enables the Baxter robot to autonomously grasp a hospital gown hung on a rail, bring it close to the user and successfully dress the upper-body.

Conference paper

Gao Y, Chang HJ, Demiris Y, 2020, User modelling using multimodal information for personalised dressing assistance, IEEE Access, Vol: 8, Pages: 45700-45714, ISSN: 2169-3536

Journal article

Nunes UM, Demiris Y, 2020, Online unsupervised learning of the 3D kinematic structure of arbitrary rigid bodies, IEEE/CVF International Conference on Computer Vision (ICCV), Publisher: IEEE Computer Soc, Pages: 3808-3816, ISSN: 1550-5499

This work addresses the problem of 3D kinematic structure learning of arbitrary articulated rigid bodies from RGB-D data sequences. Typically, this problem is addressed by offline methods that process a batch of frames, assuming that complete point trajectories are available. However, this approach is not feasible when considering scenarios that require continuity and fluidity, for instance, human-robot interaction. In contrast, we propose to tackle this problem in an online unsupervised fashion, by recursively maintaining the metric distance of the scene's 3D structure, while achieving real-time performance. The influence of noise is mitigated by building a similarity measure based on a linear embedding representation and incorporating this representation into the original metric distance. The kinematic structure is then estimated based on a combination of implicit motion and spatial properties. The proposed approach achieves competitive performance both quantitatively and qualitatively in terms of estimation accuracy, even compared to offline methods.

Conference paper

Chacon-Quesada R, Demiris Y, 2020, Augmented reality controlled smart wheelchair using dynamic signifiers for affordance representation, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE

The design of augmented reality interfaces for people with mobility impairments is a novel area with great potential, as well as multiple outstanding research challenges. In this paper we present an augmented reality user interface for controlling a smart wheelchair with a head-mounted display to provide assistance for mobility restricted people. Our motivation is to reduce the cognitive requirements needed to control a smart wheelchair. A key element of our platform is the ability to control the smart wheelchair using the concepts of affordances and signifiers. In addition to the technical details of our platform, we present a baseline study by evaluating our platform through user-trials of able-bodied individuals and two different affordances: 1) Door Go Through and 2) People Approach. To present these affordances to the user, we evaluated fixed symbol based signifiers versus our novel dynamic signifiers in terms of ease to understand the suggested actions and its relation with the objects. Our results show a clear preference for dynamic signifiers. In addition, we show that the task load reported by participants is lower when controlling the smart wheelchair with our augmented reality user interface compared to using the joystick, which is consistent with their qualitative answers.

Conference paper

Zolotas M, Demiris Y, 2020, Towards explainable shared control using augmented reality, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), Publisher: IEEE, Pages: 3020-3026

Shared control plays a pivotal role in establishing effective human-robot interactions. Traditional control-sharing methods strive to complement a human’s capabilities at safely completing a task, and thereby rely on users forming a mental model of the expected robot behaviour. However, these methods can often bewilder or frustrate users whenever their actions do not elicit the intended system response, forming a misalignment between the respective internal models of the robot and human. To resolve this model misalignment, we introduce Explainable Shared Control as a paradigm in which assistance and information feedback are jointly considered. Augmented reality is presented as an integral component of this paradigm, by visually unveiling the robot’s inner workings to human operators. Explainable Shared Control is instantiated and tested for assistive navigation in a setup involving a robotic wheelchair and a Microsoft HoloLens with add-on eye tracking. Experimental results indicate that the introduced paradigm facilitates transparent assistance by improving recovery times from adverse events associated with model misalignment.

Conference paper

Zambelli M, Cully A, Demiris Y, 2020, Multimodal representation models for prediction and control from partial information, Robotics and Autonomous Systems, Vol: 123, ISSN: 0921-8890

Similar to humans, robots benefit from interacting with their environment through a number of different sensor modalities, such as vision, touch, sound. However, learning from different sensor modalities is difficult, because the learning model must be able to handle diverse types of signals, and learn a coherent representation even when parts of the sensor inputs are missing. In this paper, a multimodal variational autoencoder is proposed to enable an iCub humanoid robot to learn representations of its sensorimotor capabilities from different sensor modalities. The proposed model is able to (1) reconstruct missing sensory modalities, (2) predict the sensorimotor state of self and the visual trajectories of other agents actions, and (3) control the agent to imitate an observed visual trajectory. Also, the proposed multimodal variational autoencoder can capture the kinematic redundancy of the robot motion through the learned probability distribution. Training multimodal models is not trivial due to the combinatorial complexity given by the possibility of missing modalities. We propose a strategy to train multimodal models, which successfully achieves improved performance of different reconstruction models. Finally, extensive experiments have been carried out using an iCub humanoid robot, showing high performance in multiple reconstruction, prediction and imitation tasks.

Journal article

Buizza C, Fischer T, Demiris Y, 2019, Real-time multi-person pose tracking using data assimilation, IEEE Winter Conference on Applications of Computer Vision, Publisher: IEEE

We propose a framework for the integration of data assimilation and machine learning methods in human pose estimation, with the aim of enabling any pose estimation method to be run in real-time, whilst also increasing consistency and accuracy. Data assimilation and machine learning are complementary methods: the former allows us to make use of information about the underlying dynamics of a system but lacks the flexibility of a data-based model, which we can instead obtain with the latter. Our framework presents a real-time tracking module for any single or multi-person pose estimation system. Specifically, tracking is performed by a number of Kalman filters initiated for each new person appearing in a motion sequence. This permits tracking of multiple skeletons and reduces the frequency that computationally expensive pose estimation has to be run, enabling online pose tracking. The module tracks for N frames while the pose estimates are calculated for frame (N+1). This also results in increased consistency of person identification and reduced inaccuracies due to missing joint locations and inversion of left-and right-side joints.

Conference paper

Neerincx MA, van Vught W, Henkemans OB, Oleari E, Broekens J, Peters R, Kaptein F, Demiris Y, Kiefer B, Fumagalli D, Bierman Bet al., 2019, Socio-cognitive engineering of a robotic partner for child's diabetes self-management, Frontiers in Robotics and AI, Vol: 6, Pages: 1-16, ISSN: 2296-9144

Social or humanoid robots do hardly show up in “the wild,” aiming at pervasive and enduring human benefits such as child health. This paper presents a socio-cognitive engineering (SCE) methodology that guides the ongoing research & development for an evolving, longer-lasting human-robot partnership in practice. The SCE methodology has been applied in a large European project to develop a robotic partner that supports the daily diabetes management processes of children, aged between 7 and 14 years (i.e., Personal Assistant for a healthy Lifestyle, PAL). Four partnership functions were identified and worked out (joint objectives, agreements, experience sharing, and feedback & explanation) together with a common knowledge-base and interaction design for child's prolonged disease self-management. In an iterative refinement process of three cycles, these functions, knowledge base and interactions were built, integrated, tested, refined, and extended so that the PAL robot could more and more act as an effective partner for diabetes management. The SCE methodology helped to integrate into the human-agent/robot system: (a) theories, models, and methods from different scientific disciplines, (b) technologies from different fields, (c) varying diabetes management practices, and (d) last but not least, the diverse individual and context-dependent needs of the patients and caregivers. The resulting robotic partner proved to support the children on the three basic needs of the Self-Determination Theory: autonomy, competence, and relatedness. This paper presents the R&D methodology and the human-robot partnership framework for prolonged “blended” care of children with a chronic disease (children could use it up to 6 months; the robot in the hospitals and diabetes camps, and its avatar at home). It represents a new type of human-agent/robot systems with an evolving collective intelligence. The underlying ontology and design rationale can be used

Journal article

Schettino V, Demiris Y, 2019, Inference of user-intention in remote robot wheelchair assistance using multimodal interfaces, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 4600-4606, ISSN: 2153-0858

Conference paper

Cortacero K, Fischer T, Demiris Y, 2019, RT-BENE: A Dataset and Baselines for Real-Time Blink Estimation in Natural Environments, IEEE International Conference on Computer Vision Workshops, Publisher: Institute of Electrical and Electronics Engineers Inc.

In recent years gaze estimation methods have made substantial progress, driven by the numerous application areas including human-robot interaction, visual attention estimation and foveated rendering for virtual reality headsets. However, many gaze estimation methods typically assume that the subject's eyes are open; for closed eyes, these methods provide irregular gaze estimates. Here, we address this assumption by first introducing a new open-sourced dataset with annotations of the eye-openness of more than 200,000 eye images, including more than 10,000 images where the eyes are closed. We further present baseline methods that allow for blink detection using convolutional neural networks. In extensive experiments, we show that the proposed baselines perform favourably in terms of precision and recall. We further incorporate our proposed RT-BENE baselines in the recently presented RT-GENE gaze estimation framework where it provides a real-time inference of the openness of the eyes. We argue that our work will benefit both gaze estimation and blink estimation methods, and we take steps towards unifying these methods.

Conference paper

Taniguchi T, Ugur E, Ogata T, Nagai T, Demiris Yet al., 2019, Editorial: Machine Learning Methods for High-Level Cognitive Capabilities in Robotics, FRONTIERS IN NEUROROBOTICS, Vol: 13, ISSN: 1662-5218

Journal article

Zhang F, Cully A, Demiris Y, 2019, Probabilistic real-time user posture tracking for personalized robot-assisted dressing, IEEE Transactions on Robotics, Vol: 35, Pages: 873-888, ISSN: 1552-3098

Robotic solutions to dressing assistance have the potential to provide tremendous support for elderly and disabled people. However, unexpected user movements may lead to dressing failures or even pose a risk to the user. Tracking such user movements with vision sensors is challenging due to severe visual occlusions created by the robot and clothes. In this paper, we propose a probabilistic tracking method using Bayesian networks in latent spaces, which fuses robot end-effector positions and force information to enable cameraless and real-time estimation of the user postures during dressing. The latent spaces are created before dressing by modeling the user movements with a Gaussian process latent variable model, taking the user’s movement limitations into account. We introduce a robot-assisted dressing system that combines our tracking method with hierarchical multitask control to minimize the force between the user and the robot. The experimental results demonstrate the robustness and accuracy of our tracking method. The proposed method enables the Baxter robot to provide personalized dressing assistance in putting on a sleeveless jacket for users with (simulated) upper-body impairments.

Journal article

Bagga S, Maurer B, Miller T, Quinlan L, Silvestri L, Wells D, Winqvist R, Zolotas M, Demiris Yet al., 2019, instruMentor: An Interactive Robot for Musical Instrument Tutoring, Towards Autonomous Robotic Systems Conference, Publisher: Springer International Publishing, Pages: 303-315, ISSN: 0302-9743

Conference paper

Wang R, Ciliberto C, Amadori P, Demiris Yet al., 2019, Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation, Thirty-sixth International Conference on Machine Learning, Publisher: Proceedings of International Conference on Machine Learning (ICML-2019)

We consider the problem of imitation learning from a finite set of experttrajectories, without access to reinforcement signals. The classical approachof extracting the expert's reward function via inverse reinforcement learning,followed by reinforcement learning is indirect and may be computationallyexpensive. Recent generative adversarial methods based on matching the policydistribution between the expert and the agent could be unstable duringtraining. We propose a new framework for imitation learning by estimating thesupport of the expert policy to compute a fixed reward function, which allowsus to re-frame imitation learning within the standard reinforcement learningsetting. We demonstrate the efficacy of our reward function on both discreteand continuous domains, achieving comparable or better performance than thestate of the art under different reinforcement learning algorithms.

Conference paper

Cully A, Demiris Y, 2019, Online knowledge level tracking with data-driven student models and collaborative filtering, IEEE Transactions on Knowledge and Data Engineering, Vol: 32, Pages: 2000-2013, ISSN: 1041-4347

Intelligent Tutoring Systems are promising tools for delivering optimal and personalised learning experiences to students. A key component for their personalisation is the student model, which infers the knowledge level of the students to balance the difficulty of the exercises. While important advances have been achieved, several challenges remain. In particular, the models should be able to track in real-time the evolution of the students' knowledge levels. These evolutions are likely to follow different profiles for each student, while measuring the exact knowledge level remains difficult given the limited and noisy information provided by the interactions. This paper introduces a novel model that addresses these challenges with three contributions: 1) the model relies on Gaussian Processes to track online the evolution of the student's knowledge level over time, 2) it uses collaborative filtering to rapidly provide long-term predictions by leveraging the information from previous users, and 3) it automatically generates abstract representations of knowledge components via automatic relevance determination of covariance matrices. The model is evaluated on three datasets, including real users. The results demonstrate that the model converges to accurate predictions in average 4 times faster than the compared methods.

Journal article

Celiktutan O, Demiris Y, 2019, Inferring human knowledgeability from eye gaze in mobile learning environments, 15th European Conference on Computer Vision (ECCV), Publisher: SPRINGER INTERNATIONAL PUBLISHING AG, Pages: 193-209, ISSN: 0302-9743

What people look at during a visual task reflects an interplay between ocular motor functions and cognitive processes. In this paper, we study the links between eye gaze and cognitive states to investigate whether eye gaze reveal information about an individual’s knowledgeability. We focus on a mobile learning scenario where a user and a virtual agent play a quiz game using a hand-held mobile device. To the best of our knowledge, this is the first attempt to predict user’s knowledgeability from eye gaze using a noninvasive eye tracking method on mobile devices: we perform gaze estimation using front-facing camera of mobile devices in contrast to using specialised eye tracking devices. First, we define a set of eye movement features that are discriminative for inferring user’s knowledgeability. Next, we train a model to predict users’ knowledgeability in the course of responding to a question. We obtain a classification performance of 59.1% achieving human performance, using eye movement features only, which has implications for (1) adapting behaviours of the virtual agent to user’s needs (e.g., virtual agent can give hints); (2) personalising quiz questions to the user’s perceived knowledgeability.

Conference paper

Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Zajc LČ, Vojír T, Bhat G, Lukežič A, Eldesokey A, Fernández G, García-Martín Á, Iglesias-Arias Á, Alatan AA, González-García A, Petrosino A, Memarmoghadam A, Vedaldi A, Muhič A, He A, Smeulders A, Perera AG, Li B, Chen B, Kim C, Xu C, Xiong C, Tian C, Luo C, Sun C, Hao C, Kim D, Mishra D, Chen D, Wang D, Wee D, Gavves E, Gundogdu E, Velasco-Salido E, Khan FS, Yang F, Zhao F, Li F, Battistone F, De Ath G, Subrahmanyam GRKS, Bastos G, Ling H, Galoogahi HK, Lee H, Li H, Zhao H, Fan H, Zhang H, Possegger H, Li H, Lu H, Zhi H, Li H, Lee H, Chang HJ, Drummond I, Valmadre J, Martin JS, Chahl J, Choi JY, Li J, Wang J, Qi J, Sung J, Johnander J, Henriques J, Choi J, van de Weijer J, Herranz JR, Martínez JM, Kittler J, Zhuang J, Gao J, Grm K, Zhang L, Wang L, Yang L, Rout L, Si L, Bertinetto L, Chu L, Che M, Maresca ME, Danelljan M, Yang MH, Abdelpakey M, Shehata M, Kang Met al., 2019, The sixth visual object tracking VOT2018 challenge results, European Conference on Computer Vision, Publisher: Springer, Pages: 3-53, ISSN: 0302-9743

The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new long-term tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (

Conference paper

Wang R, Amadori P, Demiris Y, 2019, Real-time workload classification during driving using hyperNetworks, International Conference on Intelligent Robots and Systems (IROS 2018), Publisher: IEEE, ISSN: 2153-0866

Classifying human cognitive states from behavioral and physiological signals is a challenging problem with important applications in robotics. The problem is challenging due to the data variability among individual users, and sensor artifacts. In this work, we propose an end-to-end framework for real-time cognitive workload classification with mixture Hyper Long Short Term Memory Networks (m-HyperLSTM), a novelvariant of HyperNetworks. Evaluating the proposed approach on an eye-gaze pattern dataset collected from simulated driving scenarios of different cognitive demands, we show that the proposed framework outperforms previous baseline methods and achieves 83.9% precision and 87.8% recall during test. We also demonstrate the merit of our proposed architecture by showing improved performance over other LSTM-basedmethods

Conference paper

Zolotas M, Elsdon J, Demiris Y, 2019, Head-mounted augmented reality for explainable robotic wheelchair assistance, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, ISSN: 2153-0866

Robotic wheelchairs with built-in assistive fea-tures, such as shared control, are an emerging means ofproviding independent mobility to severely disabled individuals.However, patients often struggle to build a mental model oftheir wheelchair’s behaviour under different environmentalconditions. Motivated by the desire to help users bridge thisgap in perception, we propose a novel augmented realitysystem using a Microsoft Hololens as a head-mounted aid forwheelchair navigation. The system displays visual feedback tothe wearer as a way of explaining the underlying dynamicsof the wheelchair’s shared controller and its predicted futurestates. To investigate the influence of different interface designoptions, a pilot study was also conducted. We evaluated theacceptance rate and learning curve of an immersive wheelchairtraining regime, revealing preliminary insights into the potentialbeneficial and adverse nature of different augmented realitycues for assistive navigation. In particular, we demonstrate thatcare should be taken in the presentation of information, witheffort-reducing cues for augmented information acquisition (forexample, a rear-view display) being the most appreciated.

Conference paper

Di Veroli C, Le CA, Lemaire T, Makabu E, Nur A, Ooi V, Park JY, Sanna F, Chacon R, Demiris Yet al., 2019, LibRob: An autonomous assistive librarian, Pages: 15-26, ISBN: 9783030253318

This study explores how new robotic systems can help library users efficiently locate the book they require. A survey conducted among Imperial College students has shown an absence of a time-efficient and organised method to find the books they are looking for in the college library. The solution implemented, LibRob, is an automated assistive robot that gives guidance to the users in finding the book they are searching for in an interactive manner to deliver a more satisfactory experience. LibRob is able to process a search request either by speech or by text and return a list of relevant books by author, subject or title. Once the user selects the book of interest, LibRob guides them to the shelf containing the book, then returns to its base station on completion. Experimental results demonstrate that the robot reduces the time necessary to find a book by 47.4%, and left 80% of the users satisfied with their experience, proving that human-robot interactions can greatly improve the efficiency of basic activities within a library environment.

Book chapter

Fischer T, 2019, Perspective Taking in Robots: A Framework and Computational Model

Humans are inherently social beings that benefit from their perceptional capability to embody another point of view. This thesis examines this capability, termed perspective taking, using a mixed forward/reverse engineering approach. While previous approaches were limited to known, artificial environments, the proposed approach results in a perceptional framework that can be used in unconstrained environments while at the same time detailing the mechanisms that humans use to infer the world's characteristics from another viewpoint.First, the thesis explores a forward engineering approach by outlining the required perceptional components and implementing these components on a humanoid iCub robot. Prior to and during the perspective taking, the iCub learns the environment and recognizes its constituent objects before approximating the gaze of surrounding humans based on their head poses. Inspired by psychological studies, two separate mechanisms for the two types of perspective taking are employed, one based on line-of-sight tracing and another based on the mental rotation of the environment.Acknowledging that human head pose is only a rough indication of a human's viewpoint, the thesis introduces a novel, automated approach for ground truth eye gaze annotation. This approach is used to collect a new dataset, which covers a wide range of camera-subject distances, head poses, and gazes. A novel gaze estimation method trained on this dataset outperforms previous methods in close distance scenarios, while going beyond previous methods and also allowing eye gaze estimation in large camera-subject distances that are commonly encountered in human-robot interactions.Finally, the thesis proposes a computational model as an instantiation of a reverse engineering approach, with the aim of understanding the underlying mechanisms of perspective taking in humans. The model contains a set of forward models as building blocks, and an attentional component to reduce the model's respo

Thesis dissertation

Choi J, Chang HJ, Fischer T, Yun S, Lee K, Jeong J, Demiris Y, Choi JYet al., 2018, Context-aware deep feature compression for high-speed visual tracking, IEEE Conference on Computer Vision and Pattern Recognition, Publisher: Institute of Electrical and Electronics Engineers, Pages: 479-488, ISSN: 1063-6919

We propose a new context-aware correlation filter based tracking framework to achieve both high computational speed and state-of-the-art performance among real-time trackers. The major contribution to the high computational speed lies in the proposed deep feature compression that is achieved by a context-aware scheme utilizing multiple expert auto-encoders; a context in our framework refers to the coarse category of the tracking target according to appearance patterns. In the pre-training phase, one expert auto-encoder is trained per category. In the tracking phase, the best expert auto-encoder is selected for a given target, and only this auto-encoder is used. To achieve high tracking performance with the compressed feature map, we introduce extrinsic denoising processes and a new orthogonality loss term for pre-training and fine-tuning of the expert auto-encoders. We validate the proposed context-aware framework through a number of experiments, where our method achieves a comparable performance to state-of-the-art trackers which cannot run in real-time, while running at a significantly fast speed of over 100 fps.

Conference paper

Moulin-Frier C, Fischer T, Petit M, Pointeau G, Puigbo JY, Pattacini U, Low SC, Camilleri D, Nguyen P, Hoffmann M, Chang HJ, Zambelli M, Mealier AL, Damianou A, Metta G, Prescott TJ, Demiris Y, Dominey PF, Verschure PFMJet al., 2018, DAC-h3: A Proactive Robot Cognitive Architecture to Acquire and Express Knowledge About the World and the Self, IEEE Transactions on Cognitive and Developmental Systems, Vol: 10, Pages: 1005-1022, ISSN: 2379-8920

This paper introduces a cognitive architecture for a humanoid robot to engage in a proactive, mixed-initiative exploration and manipulation of its environment, where the initiative can originate from both the human and the robot. The framework, based on a biologically-grounded theory of the brain and mind, integrates a reactive interaction engine, a number of state-of-the art perceptual and motor learning algorithms, as well as planning abilities and an autobiographical memory. The architecture as a whole drives the robot behavior to solve the symbol grounding problem, acquire language capabilities, execute goal-oriented behavior, and express a verbal narrative of its own experience in the world. We validate our approach in human-robot interaction experiments with the iCub humanoid robot, showing that the proposed cognitive architecture can be applied in real time within a realistic scenario and that it can be used with naive users.

Journal article

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00333953&limit=30&person=true