- Showing results for:
- Reset all filters
Journal articleCandela E, Doustaly O, Parada L, et al., 2023,
Risk-aware controller for autonomous vehicles using model-based collision prediction and reinforcement learning, Artificial Intelligence, Vol: 320, ISSN: 0004-3702
Autonomous Vehicles (AVs) have the potential to save millions of lives and increase the efficiency of transportation services. However, the successful deployment of AVs requires tackling multiple challenges related to modeling and certifying safety. State-of-the-art decision-making methods usually rely on end-to-end learning or imitation learning approaches, which still pose significant safety risks. Hence the necessity of risk-aware AVs that can better predict and handle dangerous situations. Furthermore, current approaches tend to lack explainability due to their reliance on end-to-end Deep Learning, where significant causal relationships are not guaranteed to be learned from data. This paper introduces a novel risk-aware framework for training AV agents using a bespoke collision prediction model and Reinforcement Learning (RL). The collision prediction model is based on Gaussian Processes and vehicle dynamics, and is used to generate the RL state vector. Using an explicit risk model increases the post-hoc explainability of the AV agent, which is vital for reaching and certifying the high safety levels required for AVs and other safety-sensitive applications. Experimental results obtained with a simulator and state-of-the-art RL algorithms show that the risk-aware RL framework decreases average collision rates by 15%, makes AVs more robust to sudden harsh braking situations, and achieves better performance in both safety and speed when compared to a standard rule-based method (the Intelligent Driver Model). Moreover, the proposed collision prediction model outperforms other models in the literature.
Journal articleZhang X, Demiris Y, 2023,
Visible and Infrared Image Fusion using Deep Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence
Conference paperZolotas M, Demiris Y, 2022,
Disentangled sequence clustering for human intention inference, IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE, Pages: 9814-9820, ISSN: 2153-0866
Equipping robots with the ability to infer human intent is a vital precondition for effective collaboration. Most computational approaches towards this objective derive a probability distribution of “intent” conditioned on the robot’s perceived state. However, these approaches typically assumetask-specific labels of human intent are known a priori. To overcome this constraint, we propose the Disentangled Sequence Clustering Variational Autoencoder (DiSCVAE), a clustering framework capable of learning such a distribution of intent in an unsupervised manner. The proposed framework leverages recent advances in unsupervised learning to disentangle latentrepresentations of sequence data, separating time-varying local features from time-invariant global attributes. As a novel extension, the DiSCVAE also infers a discrete variable to form a latent mixture model and thus enable clustering over these global sequence concepts, e.g. high-level intentions. We evaluate the DiSCVAE on a real-world human-robot interaction datasetcollected using a robotic wheelchair. Our findings reveal that the inferred discrete variable coincides with human intent, holding promise for collaborative settings, such as shared control.
Conference paperChacon Quesada R, Demiris Y, 2022,
Holo-SpoK: Affordance-aware augmented reality control of legged manipulators, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 856-862
Although there is extensive research regarding legged manipulators, comparatively little focuses on their User Interfaces (UIs). Towards extending the state-of-art in this domain, in this work, we integrate a Boston Dynamics(BD) Spot® with a light-weight 7 DoF Kinova® robot arm and a Robotiq® 2F-85 gripper into a legged manipulator. Furthermore, we jointly control the robotic platform using an affordance-aware Augmented Reality (AR) Head-Mounted Display (HMD) UI developed for the Microsoft HoloLens 2. We named the combined platform Holo-SpoK. Moreover, we explain how this manipulator colocalises with the HoloLens 2 for its control through AR. In addition, we present the details of our algorithms for autonomously detecting grasp-ability affordances and for the refinement of the positions obtainedvia vision-based colocalisation. We validate the suitability of our proposed methods with multiple navigation and manipulation experiments. To the best of our knowledge, this is the first demonstration of an AR HMD UI for controlling legged manipulators.
Journal articleAmadori PV, Fischer T, Wang R, et al., 2022,
Predicting secondary task performance: a directly actionable metric for cognitive overload detection, IEEE Transactions on Cognitive and Developmental Systems, Vol: 14, Pages: 1474-1485, ISSN: 2379-8920
In this paper, we address cognitive overload detection from unobtrusive physiological signals for users in dual-tasking scenarios. Anticipating cognitive overload is a pivotal challenge in interactive cognitive systems and could lead to safer shared-control between users and assistance systems. Our framework builds on the assumption that decision mistakes on the cognitive secondary task of dual-tasking users correspond to cognitive overload events, wherein the cognitive resources required to perform the task exceed the ones available to the users. We propose DecNet, an end-to-end sequence-to-sequence deep learning model that infers in real-time the likelihood of user mistakes on the secondary task, i.e., the practical impact of cognitive overload, from eye-gaze and head-pose data. We train and test DecNet on a dataset collected in a simulated driving setup from a cohort of 20 users on two dual-tasking decision-making scenarios, with either visual or auditory decision stimuli. DecNet anticipates cognitive overload events in both scenarios and can perform in time-constrained scenarios, anticipating cognitive overload events up to 2s before they occur. We show that DecNet’s performance gap between audio and visual scenarios is consistent with user perceived difficulty. This suggests that single modality stimulation induces higher cognitive load on users, hindering their decision-making abilities.
Journal articleNunes UM, Demiris Y, 2022,
Robust Event-Based Vision Model Estimation by Dispersion Minimisation, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Vol: 44, Pages: 9561-9573, ISSN: 0162-8828
- Author Web Link
- Citations: 3
Journal articleZhang X, Angeloudis P, Demiris Y, 2022,
ST CrossingPose: a spatial-temporal graph convolutional network for skeleton-based pedestrian crossing intention prediction, IEEE Transactions on Intelligent Transportation Systems, Vol: 23, Pages: 20773-20782, ISSN: 1524-9050
Pedestrian crossing intention prediction is crucial for the safety of pedestrians in the context of both autonomous and conventional vehicles and has attracted widespread interest recently. Various methods have been proposed to perform pedestrian crossing intention prediction, among which the skeleton-based methods have been very popular in recent years. However, most existing studies utilize manually designed features to handle skeleton data, limiting the performance of these methods. To solve this issue, we propose to predict pedestrian crossing intention based on spatial-temporal graph convolutional networks using skeleton data (ST CrossingPose). The proposed method can learn both spatial and temporal patterns from skeleton data, thus having a good feature representation ability. Extensive experiments on a public dataset demonstrate that the proposed method achieves very competitive performance in predicting crossing intention while maintaining a fast inference speed. We also analyze the effect of several factors, e.g., size of pedestrians, time to event, and occlusion, on the proposed method.
Journal articleZhang X, Feng Y, Angeloudis P, et al., 2022,
Monocular visual traffic surveillance: a review, IEEE Transactions on Intelligent Transportation Systems, Vol: 23, Pages: 14148-14165, ISSN: 1524-9050
To facilitate the monitoring and management of modern transportation systems, monocular visual traffic surveillance systems have been widely adopted for speed measurement, accident detection, and accident prediction. Thanks to the recent innovations in computer vision and deep learning research, the performance of visual traffic surveillance systems has been significantly improved. However, despite this success, there is a lack of survey papers that systematically review these new methods. Therefore, we conduct a systematic review of relevant studies to fill this gap and provide guidance to future studies. This paper is structured along the visual information processing pipeline that includes object detection, object tracking, and camera calibration. Moreover, we also include important applications of visual traffic surveillance systems, such as speed measurement, behavior learning, accident detection and prediction. Finally, future research directions of visual traffic surveillance systems are outlined.
Conference paperAl-Hindawi A, Vizcaychipi M, Demiris Y, 2022,
Faster, better blink detection through curriculum learning by augmentation, ETRA '22: 2022 Symposium on Eye Tracking Research and Applications, Publisher: ACM, Pages: 1-7
Blinking is a useful biological signal that can gate gaze regression models to avoid the use of incorrect data in downstream tasks. Existing datasets are imbalanced both in frequency of class but also in intra-class difficulty which we demonstrate is a barrier for curriculum learning. We thus propose a novel curriculum augmentation scheme that aims to address frequency and difficulty imbalances implicitly which are are terming Curriculum Learning by Augmentation (CLbA).Using Curriculum Learning by Augmentation (CLbA), we achieve a state-of-the-art performance of mean Average Precision (mAP) 0.971 using ResNet-18 up from the previous state-of-the-art of mean Average Precision (mAP) of 0.757 using DenseNet-121 whilst outcompeting Curriculum Learning by Bootstrapping (CLbB) by a significant margin with improved calibration. This new training scheme thus allows the use of smaller and more performant Convolutional Neural Network (CNN) backbones fulfilling Nyquist criteria to achieve a sampling frequency of 102.3Hz. This paves the way for inference of blinking in real-time applications.
Conference paperBin Razali MH, Demiris Y, 2022,
Using a single input to forecast human action keystates in everyday pick and place actions, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 3488-3492
We define action keystates as the start or end of an actionthat contains information such as the human pose and time.Existing methods that forecast the human pose use recurrentnetworks that input and output a sequence of poses. In this pa-per, we present a method tailored for everyday pick and placeactions where the object of interest is known. In contrast toexisting methods, ours uses an input from a single timestep todirectly forecast (i) the key pose the instant the pick or placeaction is performed and (ii) the time it takes to get to the pre-dicted key pose. Experimental results show that our methodoutperforms the state-of-the-art for key pose forecasting andis comparable for time forecasting while running at least anorder of magnitude faster. Further ablative studies reveal thesignificance of the object of interest in enabling the total num-ber of parameters across all existing methods to be reduced byat least 90% without any degradation in performance.
Conference paperAl-Hindawi A, Vizcaychipi MP, Demiris Y, 2022,
What is the patient looking at? Robust gaze-scene intersection under free-viewing conditions, 47th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 2430-2434, ISSN: 1520-6149
Locating the user’s gaze in the scene, also known as Point of Regard (PoR) estimation, following gaze regression is important for many downstream tasks. Current techniques either require the user to wear and calibrate instruments, require significant pre-processing of the scene information, or place restrictions on user’s head movements.We propose a geometrically inspired algorithm that, despite its simplicity, provides high accuracy and O(J) performance under a variety of challenging situations including sparse depth maps, high noise, and high dynamic parallax between the user and the scene camera. We demonstrate the utility of the proposed algorithm in regressing the PoR from scenes captured in the Intensive Care Unit (ICU) at Chelsea & Westminster Hospital NHS Foundation Trust a .
Journal articleZhang F, Demiris Y, 2022,
Learning garment manipulation policies toward robot-assisted dressing., Science Robotics, Vol: 7, Pages: eabm6010-eabm6010, ISSN: 2470-9476
Assistive robots have the potential to support people with disabilities in a variety of activities of daily living, such as dressing. People who have completely lost their upper limb movement functionality may benefit from robot-assisted dressing, which involves complex deformable garment manipulation. Here, we report a dressing pipeline intended for these people and experimentally validate it on a medical training manikin. The pipeline is composed of the robot grasping a hospital gown hung on a rail, fully unfolding the gown, navigating around a bed, and lifting up the user's arms in sequence to finally dress the user. To automate this pipeline, we address two fundamental challenges: first, learning manipulation policies to bring the garment from an uncertain state into a configuration that facilitates robust dressing; second, transferring the deformable object manipulation policies learned in simulation to real world to leverage cost-effective data generation. We tackle the first challenge by proposing an active pre-grasp manipulation approach that learns to isolate the garment grasping area before grasping. The approach combines prehensile and nonprehensile actions and thus alleviates grasping-only behavioral uncertainties. For the second challenge, we bridge the sim-to-real gap of deformable object policy transfer by approximating the simulator to real-world garment physics. A contrastive neural network is introduced to compare pairs of real and simulated garment observations, measure their physical similarity, and account for simulator parameters inaccuracies. The proposed method enables a dual-arm robot to put back-opening hospital gowns onto a medical manikin with a success rate of more than 90%.
Conference paperJang Y, Demiris Y, 2022,
Message passing framework for vision prediction stability in human robot interaction, IEEE International Conference on Robotics and Automation 2022, Publisher: IEEE, ISSN: 2152-4092
In Human Robot Interaction (HRI) scenarios, robot systems would benefit from an understanding of the user's state, actions and their effects on the environments to enable better interactions. While there are specialised vision algorithms for different perceptual channels, such as objects, scenes, human pose, and human actions, it is worth considering how their interaction can help improve each other's output. In computer vision, individual prediction modules for these perceptual channels frequently produce noisy outputs due to the limited datasets used for training and the compartmentalisation of the perceptual channels, often resulting in noisy or unstable prediction outcomes. To stabilise vision prediction results in HRI, this paper presents a novel message passing framework that uses the memory of individual modules to correct each other's outputs. The proposed framework is designed utilising common-sense rules of physics (such as the law of gravity) to reduce noise while introducing a pipeline that helps to effectively improve the output of each other's modules. The proposed framework aims to analyse primitive human activities such as grasping an object in a video captured from the perspective of a robot. Experimental results show that the proposed framework significantly reduces the output noise of individual modules compared to the case of running independently. This pipeline can be used to measure human reactions when interacting with a robot in various HRI scenarios.
Conference paperBin Razali MH, Demiris Y, 2022,
Using eye-gaze to forecast human pose in everyday pick and place actions, IEEE International Conference on Robotics and Automation
Collaborative robots that operate alongside hu-mans require the ability to understand their intent and forecasttheir pose. Among the various indicators of intent, the eyegaze is particularly important as it signals action towards thegazed object. By observing a person’s gaze, one can effectivelypredict the object of interest and subsequently, forecast theperson’s pose. We leverage this and present a method thatforecasts the human pose using gaze information for everydaypick and place actions in a home environment. Our method firstattends to fixations to locate the coordinates of the object ofinterest before inputting said coordinates to a pose forecastingnetwork. Experiments on the MoGaze dataset show that ourgaze network lowers the errors of existing pose forecastingmethods and that incorporating prior in the form of textualinstructions further lowers the errors by a significant amount.Furthermore, the use of eye gaze now allows a simple multilayerperceptron network to directly forecast the keypose.
Journal articleQuesada RC, Demiris Y, 2022,
Proactive robot assistance: affordance-aware augmented reality user interfaces, IEEE Robotics and Automation magazine, Vol: 29, ISSN: 1070-9932
Assistive robots have the potential to increase the autonomy and quality of life of people with disabilities  . Their applications include rehabilitation robots, smart wheelchairs, companion robots, mobile manipulators, and educational robots  . However, designing an intuitive user interface (UI) for the control of assistive robots remains a challenge, as most UIs leverage traditional control interfaces, such as joysticks and keyboards, which might be challenging and even impossible for some users. Augmented reality (AR) UIs introduce more natural interactions between people and assistive robots, potentially reaching a more diverse user base.
Conference paperCandela E, Parada L, Marques L, et al., 2022,
Transferring Multi-Agent Reinforcement Learning Policies for Autonomous Driving using Sim-to-Real, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 8814-8820, ISSN: 2153-0858
Conference paperNunes UM, Demiris Y, 2022,
Kinematic Structure Estimation of Arbitrary Articulated Rigid Objects for Event Cameras, Pages: 508-514, ISSN: 1050-4729
We propose a novel method that estimates the Kinematic Structure (KS) of arbitrary articulated rigid objects from event-based data. Event cameras are emerging sensors that asynchronously report brightness changes with a time resolution of microseconds, making them suitable candidates for motion-related perception. By assuming that an articulated rigid object is composed of body parts whose shape can be approximately described by a Gaussian distribution, we jointly segment the different parts by combining an adapted Bayesian inference approach and incremental event-based motion estimation. The respective KS is then generated based on the segmented parts and their respective biharmonic distance, which is estimated by building an affinity matrix of points sampled from the estimated Gaussian distributions. The method outperforms frame-based methods in sequences obtained by simulating events from video sequences and achieves a solid performance on new high-speed motions sequences, which frame-based KS estimation methods can not handle.
Conference paperAl-Hindawi A, Vizcaychipi MP, Demiris Y, 2021,
Continuous non-invasive eye tracking in intensive care, 43rd Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society (IEEE EMBC), Publisher: IEEE, Pages: 1869-1873, ISSN: 1557-170X
Delirium, an acute confusional state, is a common occurrence in Intensive Care Units (ICUs). Patients who develop delirium have globally worse outcomes than those who do not and thus the diagnosis of delirium is of importance. Current diagnostic methods have several limitations leading to the suggestion of eye-tracking for its diagnosis through in-attention. To ascertain the requirements for an eye-tracking system in an adult ICU, measurements were carried out at Chelsea & Westminster Hospital NHS Foundation Trust. Clinical criteria guided empirical requirements of invasiveness and calibration methods while accuracy and precision were measured. A non-invasive system was then developed utilising a patient-facing RGB camera and a scene-facing RGBD camera. The system’s performance was measured in a replicated laboratory environment with healthy volunteers revealing an accuracy and precision that outperforms what is required while simultaneously being non-invasive and calibration-free The system was then deployed as part of CONfuSED, a clinical feasibility study where we report aggregated data from 5 patients as well as the acceptability of the system to bedside nursing staff. To the best of our knowledge, the system is the first eye-tracking systems to be deployed in an ICU for delirium monitoring.
Conference paperNunes UM, Demiris Y, 2021,
Live demonstration: incremental motion estimation for event-based cameras by dispersion minimisation, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE COMPUTER SOC, Pages: 1322-1323, ISSN: 2160-7508
Live demonstration setup. (Left) The setup consists of a DAVIS346B event camera connected to a standard consumer laptop and undergoes some motion. (Right) The motion estimates are plotted in red and, for rotation-like motions, the angular velocities provided by the camera IMU are also plotted in blue. This plot exemplifies an event camera undergoing large rotational motions (up to ~ 1000 deg/s) around the (a) x-axis, (b) y-axis and (c) z-axis. Overall, the incremental motion estimation method follows the IMU measurements. Optionally, the resultant global optical flow can also be shown, as well as the corresponding generated events by accumulating them onto the image plane (bottom left corner).
Conference paperChacon-Quesada R, Demiris Y, 2021,
Augmented reality eser interfaces for heterogeneous multirobot control, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 11439-11444, ISSN: 2153-0858
Recent advances in the design of head-mounted augmented reality (AR) interfaces for assistive human-robot interaction (HRI) have allowed untrained users to rapidly and fluently control single-robot platforms. In this paper, we investigate how such interfaces transfer onto multirobot architectures, as several assistive robotics applications need to be distributed among robots that are different both physically and in terms of software. As part of this investigation, we introduce a novel head-mounted AR interface for heterogeneous multirobot control. This interface generates and displays dynamic joint-affordance signifiers, i.e. signifiers that combine and show multiple actions from different robots that can be applied simultaneously to an object. We present a user study with 15 participants analysing the effects of our approach on their perceived fluency. Participants were given the task of filling-out a cup with water making use of a multirobot platform. Our results show a clear improvement in standard HRI fluency metrics when users applied dynamic joint-affordance signifiers, as opposed to a sequence of independent actions.
Conference paperTian Y, Balntas V, Ng T, et al., 2021,
D2D: Keypoint Extraction with Describe to Detect Approach, Pages: 223-240, ISSN: 0302-9743
In this paper, we present a novel approach that exploits the information within the descriptor space to propose keypoint locations. Detect then describe, or jointly detect and describe are two typical strategies for extracting local features. In contrast, we propose an approach that inverts this process by first describing and then detecting the keypoint locations. Describe-to-Detect (D2D) leverages successful descriptor models without the need for any additional training. Our method selects keypoints as salient locations with high information content which are defined by the descriptors rather than some independent operators. We perform experiments on multiple benchmarks including image matching, camera localisation, and 3D reconstruction. The results indicate that our method improves the matching performance of various descriptors and that it generalises across methods and tasks.
Journal articleGirbes-Juan V, Schettino V, Demiris Y, et al., 2021,
Haptic and Visual Feedback Assistance for Dual-Arm Robot Teleoperation in Surface Conditioning Tasks, IEEE TRANSACTIONS ON HAPTICS, Vol: 14, Pages: 44-56, ISSN: 1939-1412
Conference paperBehrens JK, Nazarczuk M, Stepanova K, et al., 2021,
Embodied Reasoning for Discovering Object Properties via Manipulation, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 10139-10145, ISSN: 1050-4729
Journal articleFischer T, Demiris Y, 2020,
Computational modelling of embodied visual perspective-taking, IEEE Transactions on Cognitive and Developmental Systems, Vol: 12, Pages: 723-732, ISSN: 2379-8920
Humans are inherently social beings that benefit from their perceptional capability to embody another point of view, typically referred to as perspective-taking. Perspective-taking is an essential feature in our daily interactions and is pivotal for human development. However, much remains unknown about the precise mechanisms that underlie perspective-taking. Here we show that formalizing perspective-taking in a computational model can detail the embodied mechanisms employed by humans in perspective-taking. The model's main building block is a set of action primitives that are passed through a forward model. The model employs a process that selects a subset of action primitives to be passed through the forward model to reduce the response time. The model demonstrates results that mimic those captured by human data, including (i) response times differences caused by the angular disparity between the perspective-taker and the other agent, (ii) the impact of task-irrelevant body posture variations in perspective-taking, and (iii) differences in the perspective-taking strategy between individuals. Our results provide support for the hypothesis that perspective-taking is a mental simulation of the physical movements that are required to match another person's visual viewpoint. Furthermore, the model provides several testable predictions, including the prediction that forced early responses lead to an egocentric bias and that a selection process introduces dependencies between two consecutive trials. Our results indicate potential links between perspective-taking and other essential perceptional and cognitive mechanisms, such as active vision and autobiographical memories.
Conference paperGoncalves Nunes UM, Demiris Y, 2020,
Entropy minimisation framework for event-based vision model estimation, 16th European Conference on Computer Vision 2020, Publisher: Springer, Pages: 161-176
We propose a novel Entropy Minimisation (EMin) frame-work for event-based vision model estimation. The framework extendsprevious event-based motion compensation algorithms to handle modelswhose outputs have arbitrary dimensions. The main motivation comesfrom estimating motion from events directly in 3D space (e.g.eventsaugmented with depth), without projecting them onto an image plane.This is achieved by modelling the event alignment according to candidateparameters and minimising the resultant dispersion. We provide a familyof suitable entropy loss functions and an efficient approximation whosecomplexity is only linear with the number of events (e.g.the complexitydoes not depend on the number of image pixels). The framework is eval-uated on several motion estimation problems, including optical flow androtational motion. As proof of concept, we also test our framework on6-DOF estimation by performing the optimisation directly in 3D space.
Conference paperZhang F, Demiris Y, 2020,
Learning grasping points for garment manipulation in robot-assisted dressing, 2020 IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 9114-9120
Assistive robots have the potential to provide tremendous support for disabled and elderly people in their daily dressing activities. Recent studies on robot-assisted dressing usually simplify the setup of the initial robot configuration by manually attaching the garments on the robot end-effector and positioning them close to the user's arm. A fundamental challenge in automating such a process for robots is computing suitable grasping points on garments that facilitate robotic manipulation. In this paper, we address this problem by introducing a supervised deep neural network to locate a predefined grasping point on the garment, using depth images for their invariance to color and texture. To reduce the amount of real data required, which is costly to collect, we leverage the power of simulation to produce large amounts of labeled data. The network is jointly trained with synthetic datasets of depth images and a limited amount of real data. We introduce a robot-assisted dressing system that combines the grasping point prediction method, with a grasping and manipulation strategy which takes grasping orientation computation and robot-garment collision avoidance into account. The experimental results demonstrate that our method is capable of yielding accurate grasping point estimations. The proposed dressing system enables the Baxter robot to autonomously grasp a hospital gown hung on a rail, bring it close to the user and successfully dress the upper-body.
Journal articleGao Y, Chang HJ, Demiris Y, 2020,
User modelling using multimodal information for personalised dressing assistance, IEEE Access, Vol: 8, Pages: 45700-45714, ISSN: 2169-3536
Conference paperZolotas M, Demiris Y, 2020,
Towards explainable shared control using augmented reality, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019), Publisher: IEEE, Pages: 3020-3026
Shared control plays a pivotal role in establishing effective human-robot interactions. Traditional control-sharing methods strive to complement a human’s capabilities at safely completing a task, and thereby rely on users forming a mental model of the expected robot behaviour. However, these methods can often bewilder or frustrate users whenever their actions do not elicit the intended system response, forming a misalignment between the respective internal models of the robot and human. To resolve this model misalignment, we introduce Explainable Shared Control as a paradigm in which assistance and information feedback are jointly considered. Augmented reality is presented as an integral component of this paradigm, by visually unveiling the robot’s inner workings to human operators. Explainable Shared Control is instantiated and tested for assistive navigation in a setup involving a robotic wheelchair and a Microsoft HoloLens with add-on eye tracking. Experimental results indicate that the introduced paradigm facilitates transparent assistance by improving recovery times from adverse events associated with model misalignment.
Conference paperChacon-Quesada R, Demiris Y, 2020,
Augmented reality controlled smart wheelchair using dynamic signifiers for affordance representation, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE
The design of augmented reality interfaces for people with mobility impairments is a novel area with great potential, as well as multiple outstanding research challenges. In this paper we present an augmented reality user interface for controlling a smart wheelchair with a head-mounted display to provide assistance for mobility restricted people. Our motivation is to reduce the cognitive requirements needed to control a smart wheelchair. A key element of our platform is the ability to control the smart wheelchair using the concepts of affordances and signifiers. In addition to the technical details of our platform, we present a baseline study by evaluating our platform through user-trials of able-bodied individuals and two different affordances: 1) Door Go Through and 2) People Approach. To present these affordances to the user, we evaluated fixed symbol based signifiers versus our novel dynamic signifiers in terms of ease to understand the suggested actions and its relation with the objects. Our results show a clear preference for dynamic signifiers. In addition, we show that the task load reported by participants is lower when controlling the smart wheelchair with our augmented reality user interface compared to using the joystick, which is consistent with their qualitative answers.
Journal articleZambelli M, Cully A, Demiris Y, 2020,
Multimodal representation models for prediction and control from partial information, Robotics and Autonomous Systems, Vol: 123, ISSN: 0921-8890
Similar to humans, robots benefit from interacting with their environment through a number of different sensor modalities, such as vision, touch, sound. However, learning from different sensor modalities is difficult, because the learning model must be able to handle diverse types of signals, and learn a coherent representation even when parts of the sensor inputs are missing. In this paper, a multimodal variational autoencoder is proposed to enable an iCub humanoid robot to learn representations of its sensorimotor capabilities from different sensor modalities. The proposed model is able to (1) reconstruct missing sensory modalities, (2) predict the sensorimotor state of self and the visual trajectories of other agents actions, and (3) control the agent to imitate an observed visual trajectory. Also, the proposed multimodal variational autoencoder can capture the kinematic redundancy of the robot motion through the learned probability distribution. Training multimodal models is not trivial due to the combinatorial complexity given by the possibility of missing modalities. We propose a strategy to train multimodal models, which successfully achieves improved performance of different reconstruction models. Finally, extensive experiments have been carried out using an iCub humanoid robot, showing high performance in multiple reconstruction, prediction and imitation tasks.
Conference paperBuizza C, Fischer T, Demiris Y, 2020,
Real-time multi-person pose tracking using data assimilation, IEEE Winter Conference on Applications of Computer Vision, Publisher: IEEE
We propose a framework for the integration of data assimilation and machine learning methods in human pose estimation, with the aim of enabling any pose estimation method to be run in real-time, whilst also increasing consistency and accuracy. Data assimilation and machine learning are complementary methods: the former allows us to make use of information about the underlying dynamics of a system but lacks the flexibility of a data-based model, which we can instead obtain with the latter. Our framework presents a real-time tracking module for any single or multi-person pose estimation system. Specifically, tracking is performed by a number of Kalman filters initiated for each new person appearing in a motion sequence. This permits tracking of multiple skeletons and reduces the frequency that computationally expensive pose estimation has to be run, enabling online pose tracking. The module tracks for N frames while the pose estimates are calculated for frame (N+1). This also results in increased consistency of person identification and reduced inaccuracies due to missing joint locations and inversion of left-and right-side joints.
Conference paperCortacero K, Fischer T, Demiris Y, 2019,
RT-BENE: A Dataset and Baselines for Real-Time Blink Estimation in Natural Environments, IEEE International Conference on Computer Vision Workshops, Publisher: Institute of Electrical and Electronics Engineers Inc.
In recent years gaze estimation methods have made substantial progress, driven by the numerous application areas including human-robot interaction, visual attention estimation and foveated rendering for virtual reality headsets. However, many gaze estimation methods typically assume that the subject's eyes are open; for closed eyes, these methods provide irregular gaze estimates. Here, we address this assumption by first introducing a new open-sourced dataset with annotations of the eye-openness of more than 200,000 eye images, including more than 10,000 images where the eyes are closed. We further present baseline methods that allow for blink detection using convolutional neural networks. In extensive experiments, we show that the proposed baselines perform favourably in terms of precision and recall. We further incorporate our proposed RT-BENE baselines in the recently presented RT-GENE gaze estimation framework where it provides a real-time inference of the openness of the eyes. We argue that our work will benefit both gaze estimation and blink estimation methods, and we take steps towards unifying these methods.
Journal articleZhang F, Cully A, Demiris Y, 2019,
Probabilistic real-time user posture tracking for personalized robot-assisted dressing, IEEE Transactions on Robotics, Vol: 35, Pages: 873-888, ISSN: 1552-3098
Robotic solutions to dressing assistance have the potential to provide tremendous support for elderly and disabled people. However, unexpected user movements may lead to dressing failures or even pose a risk to the user. Tracking such user movements with vision sensors is challenging due to severe visual occlusions created by the robot and clothes. In this paper, we propose a probabilistic tracking method using Bayesian networks in latent spaces, which fuses robot end-effector positions and force information to enable cameraless and real-time estimation of the user postures during dressing. The latent spaces are created before dressing by modeling the user movements with a Gaussian process latent variable model, taking the user’s movement limitations into account. We introduce a robot-assisted dressing system that combines our tracking method with hierarchical multitask control to minimize the force between the user and the robot. The experimental results demonstrate the robustness and accuracy of our tracking method. The proposed method enables the Baxter robot to provide personalized dressing assistance in putting on a sleeveless jacket for users with (simulated) upper-body impairments.
Conference paperKristan M, Leonardis A, Matas J, et al., 2019,
The sixth visual object tracking VOT2018 challenge results, European Conference on Computer Vision, Publisher: Springer, Pages: 3-53, ISSN: 0302-9743
The Visual Object Tracking challenge VOT2018 is the sixth annual tracker benchmarking activity organized by the VOT initiative. Results of over eighty trackers are presented; many are state-of-the-art trackers published at major computer vision conferences or in journals in the recent years. The evaluation included the standard VOT and other popular methodologies for short-term tracking analysis and a “real-time” experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. A long-term tracking subchallenge has been introduced to the set of standard VOT sub-challenges. The new subchallenge focuses on long-term tracking properties, namely coping with target disappearance and reappearance. A new dataset has been compiled and a performance evaluation methodology that focuses on long-term tracking capabilities has been adopted. The VOT toolkit has been updated to support both standard short-term and the new long-term tracking subchallenges. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The dataset, the evaluation kit and the results are publicly available at the challenge website (http://votchallenge.net).
Conference paperZolotas M, Elsdon J, Demiris Y, 2019,
Head-mounted augmented reality for explainable robotic wheelchair assistance, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, ISSN: 2153-0866
Robotic wheelchairs with built-in assistive fea-tures, such as shared control, are an emerging means ofproviding independent mobility to severely disabled individuals.However, patients often struggle to build a mental model oftheir wheelchair’s behaviour under different environmentalconditions. Motivated by the desire to help users bridge thisgap in perception, we propose a novel augmented realitysystem using a Microsoft Hololens as a head-mounted aid forwheelchair navigation. The system displays visual feedback tothe wearer as a way of explaining the underlying dynamicsof the wheelchair’s shared controller and its predicted futurestates. To investigate the influence of different interface designoptions, a pilot study was also conducted. We evaluated theacceptance rate and learning curve of an immersive wheelchairtraining regime, revealing preliminary insights into the potentialbeneficial and adverse nature of different augmented realitycues for assistive navigation. In particular, we demonstrate thatcare should be taken in the presentation of information, witheffort-reducing cues for augmented information acquisition (forexample, a rear-view display) being the most appreciated.
Book chapterDi Veroli C, Le CA, Lemaire T, et al., 2019,
LibRob: An autonomous assistive librarian, Pages: 15-26, ISBN: 9783030253318
This study explores how new robotic systems can help library users efficiently locate the book they require. A survey conducted among Imperial College students has shown an absence of a time-efficient and organised method to find the books they are looking for in the college library. The solution implemented, LibRob, is an automated assistive robot that gives guidance to the users in finding the book they are searching for in an interactive manner to deliver a more satisfactory experience. LibRob is able to process a search request either by speech or by text and return a list of relevant books by author, subject or title. Once the user selects the book of interest, LibRob guides them to the shelf containing the book, then returns to its base station on completion. Experimental results demonstrate that the robot reduces the time necessary to find a book by 47.4%, and left 80% of the users satisfied with their experience, proving that human-robot interactions can greatly improve the efficiency of basic activities within a library environment.
Conference paperChoi J, Chang HJ, Fischer T, et al., 2018,
Context-aware deep feature compression for high-speed visual tracking, IEEE Conference on Computer Vision and Pattern Recognition, Publisher: Institute of Electrical and Electronics Engineers, Pages: 479-488, ISSN: 1063-6919
We propose a new context-aware correlation filter based tracking framework to achieve both high computational speed and state-of-the-art performance among real-time trackers. The major contribution to the high computational speed lies in the proposed deep feature compression that is achieved by a context-aware scheme utilizing multiple expert auto-encoders; a context in our framework refers to the coarse category of the tracking target according to appearance patterns. In the pre-training phase, one expert auto-encoder is trained per category. In the tracking phase, the best expert auto-encoder is selected for a given target, and only this auto-encoder is used. To achieve high tracking performance with the compressed feature map, we introduce extrinsic denoising processes and a new orthogonality loss term for pre-training and fine-tuning of the expert auto-encoders. We validate the proposed context-aware framework through a number of experiments, where our method achieves a comparable performance to state-of-the-art trackers which cannot run in real-time, while running at a significantly fast speed of over 100 fps.
Journal articleMoulin-Frier C, Fischer T, Petit M, et al., 2018,
DAC-h3: A Proactive Robot Cognitive Architecture to Acquire and Express Knowledge About the World and the Self, IEEE Transactions on Cognitive and Developmental Systems, Vol: 10, Pages: 1005-1022, ISSN: 2379-8920
This paper introduces a cognitive architecture for a humanoid robot to engage in a proactive, mixed-initiative exploration and manipulation of its environment, where the initiative can originate from both the human and the robot. The framework, based on a biologically-grounded theory of the brain and mind, integrates a reactive interaction engine, a number of state-of-the art perceptual and motor learning algorithms, as well as planning abilities and an autobiographical memory. The architecture as a whole drives the robot behavior to solve the symbol grounding problem, acquire language capabilities, execute goal-oriented behavior, and express a verbal narrative of its own experience in the world. We validate our approach in human-robot interaction experiments with the iCub humanoid robot, showing that the proposed cognitive architecture can be applied in real time within a realistic scenario and that it can be used with naive users.
Journal articleChang HJ, Fischer T, Petit M, et al., 2018,
Learning kinematic structure correspondences using multi-order similarities, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 40, Pages: 2920-2934, ISSN: 0162-8828
We present a novel framework for finding the kinematic structure correspondences between two articulated objects in videos via hypergraph matching. In contrast to appearance and graph alignment based matching methods, which have been applied among two similar static images, the proposed method finds correspondences between two dynamic kinematic structures of heterogeneous objects in videos. Thus our method allows matching the structure of objects which have similar topologies or motions, or a combination of the two. Our main contributions are summarised as follows: (i)casting the kinematic structure correspondence problem into a hypergraph matching problem by incorporating multi-order similarities with normalising weights, (ii)introducing a structural topology similarity measure by aggregating topology constrained subgraph isomorphisms, (iii)measuring kinematic correlations between pairwise nodes, and (iv)proposing a combinatorial local motion similarity measure using geodesic distance on the Riemannian manifold. We demonstrate the robustness and accuracy of our method through a number of experiments on synthetic and real data, showing that various other recent and state of the art methods are outperformed. Our method is not limited to a specific application nor sensor, and can be used as building block in applications such as action recognition, human motion retargeting to robots, and articulated object manipulation.
Conference paperFischer T, Chang HJ, Demiris Y, 2018,
RT-GENE: Real-time eye gaze estimation in natural environments, European Conference on Computer Vision, Publisher: Springer Verlag, Pages: 339-357, ISSN: 0302-9743
In this work, we consider the problem of robust gaze estimation in natural environments. Large camera-to-subject distances and high variations in head pose and eye gaze angles are common in such environments. This leads to two main shortfalls in state-of-the-art methods for gaze estimation: hindered ground truth gaze annotation and diminished gaze estimation accuracy as image resolution decreases with distance. We first record a novel dataset of varied gaze and head pose images in a natural environment, addressing the issue of ground truth annotation by measuring head pose using a motion capture system and eye gaze using mobile eyetracking glasses. We apply semantic image inpainting to the area covered by the glasses to bridge the gap between training and testing images by removing the obtrusiveness of the glasses. We also present a new real-time algorithm involving appearance-based deep convolutional neural networks with increased capacity to cope with the diverse images in the new dataset. Experiments with this network architecture are conducted on a number of diverse eye-gaze datasets including our own, and in cross dataset evaluations. We demonstrate state-of-the-art performance in terms of estimation accuracy in all experiments, and the architecture performs well even on lower resolution images.
Conference paperNguyen P, Fischer T, Chang HJ, et al., 2018,
Transferring visuomotor learning from simulation to the real world for robotics manipulation tasks, IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE, Pages: 6667-6674, ISSN: 2153-0866
Hand-eye coordination is a requirement for many manipulation tasks including grasping and reaching. However, accurate hand-eye coordination has shown to be especially difficult to achieve in complex robots like the iCub humanoid. In this work, we solve the hand-eye coordination task using a visuomotor deep neural network predictor that estimates the arm's joint configuration given a stereo image pair of the arm and the underlying head configuration. As there are various unavoidable sources of sensing error on the physical robot, we train the predictor on images obtained from simulation. The images from simulation were modified to look realistic using an image-to-image translation approach. In various experiments, we first show that the visuomotor predictor provides accurate joint estimates of the iCub's hand in simulation. We then show that the predictor can be used to obtain the systematic error of the robot's joint measurements on the physical iCub robot. We demonstrate that a calibrator can be designed to automatically compensate this error. Finally, we validate that this enables accurate reaching of objects while circumventing manual fine-calibration of the robot.
Conference paperChacon Quesada R, Demiris Y, 2018,
Augmented reality control of smart wheelchair using eye-gaze–enabled selection of affordances, https://www.idiap.ch/workshop/iros2018/files/, IROS 2018 Workshop on Robots for Assisted Living
In this paper we present a novel augmented reality head mounted display user interface for controlling a robotic wheelchair for people with limited mobility. To lower the cognitive requirements needed to control the wheelchair, we propose integration of a smart wheelchair with an eye-tracking enabled head-mounted display. We propose a novel platform that integrates multiple user interface interaction methods for aiming at and selecting affordances derived by on-board perception capabilities such as laser-scanner readings and cameras. We demonstrate the effectiveness of the approach by evaluating our platform in two realistic scenarios: 1) Door detection, where the affordance corresponds to a Door object and the Go-Through action and 2) People detection, where the affordance corresponds to a Person and the Approach action. To the best of our knowledge, this is the first demonstration of a augmented reality head-mounted display user interface for controlling a smart wheelchair.
Conference paperFischer T, Demiris Y, 2018,
A computational model for embodied visual perspective taking: from physical movements to mental simulation, Vision Meets Cognition Workshop at CVPR 2018
To understand people and their intentions, humans have developed the ability to imagine their surroundings from another visual point of view. This cognitive ability is called perspective taking and has been shown to be essential in child development and social interactions. However, the precise cognitive mechanisms underlying perspective taking remain to be fully understood. Here we present a computa- tional model that implements perspective taking as a mental simulation of the physical movements required to step into the other point of view. The visual percept after each mental simulation step is estimated using a set of forward models. Based on our experimental results, we propose that a visual attention mechanism explains the response times reported in human visual perspective taking experiments. The model is also able to generate several testable predictions to be explored in further neurophysiological studies.
Conference paperElsdon J, Demiris Y, 2018,
Augmented reality for feedback in a shared control spraying task, IEEE International Conference on Robotics and Automation (ICRA), Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 1939-1946, ISSN: 1050-4729
Using industrial robots to spray structures has been investigated extensively, however interesting challenges emerge when using handheld spraying robots. In previous work we have demonstrated the use of shared control of a handheld spraying robot to assist a user in a 3D spraying task. In this paper we demonstrate the use of Augmented Reality Interfaces to increase the user's progress and task awareness. We describe our solutions to challenging calibration issues between the Microsoft Hololens system and a motion capture system without the need for well defined markers or careful alignment on the part of the user. Error relative to the motion capture system was shown to be 10mm after only a 4 second calibration routine. Secondly we outline a logical approach for visualising liquid density for an augmented reality spraying task, this system allows the user to see target regions to complete, areas that are complete and areas that have been overdosed clearly. Finally we produced a user study to investigate the level of assistance that a handheld robot utilising shared control methods should provide during a spraying task. Using a handheld spraying robot with a moving spray head did not aid the user much over simply actuating spray nozzle for them. Compared to manual control the automatic modes significantly reduced the task load experienced by the user and significantly increased the quality of the result of the spraying task, reducing the error by 33-45%.
Journal articleCully AHR, Demiris Y, 2018,
Quality and diversity optimization: a unifying modular framework, IEEE Transactions on Evolutionary Computation, Vol: 22, Pages: 245-259, ISSN: 1941-0026
The optimization of functions to find the best solution according to one or several objectives has a central role in many engineering and research fields. Recently, a new family of optimization algorithms, named Quality-Diversity optimization, has been introduced, and contrasts with classic algorithms. Instead of searching for a single solution, Quality-Diversity algorithms are searching for a large collection of both diverse and high-performing solutions. The role of this collection is to cover the range of possible solution types as much as possible, and to contain the best solution for each type. The contribution of this paper is threefold. Firstly, we present a unifying framework of Quality-Diversity optimization algorithms that covers the two main algorithms of this family (Multi-dimensional Archive of Phenotypic Elites and the Novelty Search with Local Competition), and that highlights the large variety of variants that can be investigated within this family. Secondly, we propose algorithms with a new selection mechanism for Quality-Diversity algorithms that outperforms all the algorithms tested in this paper. Lastly, we present a new collection management that overcomes the erosion issues observed when using unstructured collections. These three contributions are supported by extensive experimental comparisons of Quality-Diversity algorithms on three different experimental scenarios.
Journal articleFischer T, Puigbo J-Y, Camilleri D, et al., 2018,
iCub-HRI: A software framework for complex human-robot interaction scenarios on the iCub humanoid robot, Frontiers in Robotics and AI, Vol: 5, Pages: 1-9, ISSN: 2296-9144
Generating complex, human-like behaviour in a humanoid robot like the iCub requires the integration of a wide range of open source components and a scalable cognitive architecture. Hence, we present the iCub-HRI library which provides convenience wrappers for components related to perception (object recognition, agent tracking, speech recognition, touch detection), object manipulation (basic and complex motor actions) and social interaction (speech synthesis, joint attention) exposed as a C++ library with bindings for Java (allowing to use iCub-HRI within Matlab) and Python. In addition to previously integrated components, the library allows for simple extension to new components and rapid prototyping by adapting to changes in interfaces between components. We also provide a set of modules which make use of the library, such as a high-level knowledge acquisition module and an action recognition module. The proposed architecture has been successfully employed for a complex human-robot interaction scenario involving the acquisition of language capabilities, execution of goal-oriented behaviour and expression of a verbal narrative of the robot's experience in the world. Accompanying this paper is a tutorial which allows a subset of this interaction to be reproduced. The architecture is aimed at researchers familiarising themselves with the iCub ecosystem, as well as expert users, and we expect the library to be widely used in the iCub community.
Conference paperZhang F, Cully A, Demiris YIANNIS, 2017,
Personalized Robot-assisted Dressing using User Modeling in Latent Spaces, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, ISSN: 2153-0866
Robots have the potential to provide tremendous support to disabled and elderly people in their everyday tasks, such as dressing. Many recent studies on robotic dressing assistance usually view dressing as a trajectory planning problem. However, the user movements during the dressing process are rarely taken into account, which often leads to the failures of the planned trajectory and may put the user at risk. The main difficulty of taking user movements into account is caused by severe occlusions created by the robot, the user, and the clothes during the dressing process, which prevent vision sensors from accurately detecting the postures of the user in real time. In this paper, we address this problem by introducing an approach that allows the robot to automatically adapt its motion according to the force applied on the robot's gripper caused by user movements. There are two main contributions introduced in this paper: 1) the use of a hierarchical multi-task control strategy to automatically adapt the robot motion and minimize the force applied between the user and the robot caused by user movements; 2) the online update of the dressing trajectory based on the user movement limitations modeled with the Gaussian Process Latent Variable Model in a latent space, and the density information extracted from such latent space. The combination of these two contributions leads to a personalized dressing assistance that can cope with unpredicted user movements during the dressing while constantly minimizing the force that the robot may apply on the user. The experimental results demonstrate that the proposed method allows the Baxter humanoid robot to provide personalized dressing assistance for human users with simulated upper-body impairments.
Conference paperChoi J, Chang HJ, Yun S, et al., 2017,
Attentional correlation filter network for adaptive visual tracking, IEEE Conference on Computer Vision and Pattern Recognition, Publisher: IEEE, ISSN: 1063-6919
We propose a new tracking framework with an attentional mechanism that chooses a subset of the associated correlation filters for increased robustness and computational efficiency. The subset of filters is adaptively selected by a deep attentional network according to the dynamic properties of the tracking target. Our contributions are manifold, and are summarised as follows: (i) Introducing the Attentional Correlation Filter Network which allows adaptive tracking of dynamic targets. (ii) Utilising an attentional network which shifts the attention to the best candidate modules, as well as predicting the estimated accuracy of currently inactive modules. (iii) Enlarging the variety of correlation filters which cover target drift, blurriness, occlusion, scale changes, and flexible aspect ratio. (iv) Validating the robustness and efficiency of the attentional mechanism for visual tracking through a number of experiments. Our method achieves similar performance to non real-time trackers, and state-of-the-art performance amongst real-time trackers.
Conference paperYoo YJ, Chang H, Yun S, et al., 2017,
Variational autoencoded regression: high dimensional regression of visual data on complex manifold, IEEE Conference on Computer Vision and Pattern Recognition, Publisher: IEEE, Pages: 2943-2952
This paper proposes a new high dimensional regression method by merging Gaussian process regression into a variational autoencoder framework. In contrast to other regression methods, the proposed method focuses on the case where output responses are on a complex high dimensional manifold, such as images. Our contributions are summarized as follows: (i) A new regression method estimating high dimensional image responses, which is not handled by existing regression algorithms, is proposed. (ii) The proposed regression method introduces a strategy to learn the latent space as well as the encoder and decoder so that the result of the regressed response in the latent space coincide with the corresponding response in the data space. (iii) The proposed regression is embedded into a generative model, and the whole procedure is developed by the variational autoencoder framework. We demonstrate the robustness and effectiveness of our method through a number of experiments on various visual data regression problems.
Journal articleChang HJ, Demiris Y, 2017,
Highly articulated kinematic structure estimation combining motion and skeleton information, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 40, Pages: 2165-2179, ISSN: 0162-8828
In this paper, we present a novel framework for unsupervised kinematic structure learning of complex articulated objects from a single-view 2D image sequence. In contrast to prior motion-based methods, which estimate relatively simple articulations, our method can generate arbitrarily complex kinematic structures with skeletal topology via a successive iterative merging strategy. The iterative merge process is guided by a density weighted skeleton map which is generated from a novel object boundary generation method from sparse 2D feature points. Our main contributions can be summarised as follows: (i) An unsupervised complex articulated kinematic structure estimation method that combines motion segments with skeleton information. (ii) An iterative fine-to-coarse merging strategy for adaptive motion segmentation and structural topology embedding. (iii) A skeleton estimation method based on a novel silhouette boundary generation from sparse feature points using an adaptive model selection method. (iv) A new highly articulated object dataset with ground truth annotation. We have verified the effectiveness of our proposed method in terms of computational time and estimation accuracy through rigorous experiments. Our experiments show that the proposed method outperforms state-of-the-art methods both quantitatively and qualitatively.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.