  • Conference paper
    Zhang F, Demiris Y, 2020,

    Learning grasping points for garment manipulation in robot-assisted dressing

    , 2020 IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 9114-9120

    Assistive robots have the potential to provide tremendous support for disabled and elderly people in their daily dressing activities. Recent studies on robot-assisted dressing usually simplify the setup of the initial robot configuration by manually attaching the garments on the robot end-effector and positioning them close to the user's arm. A fundamental challenge in automating such a process for robots is computing suitable grasping points on garments that facilitate robotic manipulation. In this paper, we address this problem by introducing a supervised deep neural network to locate a predefined grasping point on the garment, using depth images for their invariance to color and texture. To reduce the amount of real data required, which is costly to collect, we leverage the power of simulation to produce large amounts of labeled data. The network is jointly trained with synthetic datasets of depth images and a limited amount of real data. We introduce a robot-assisted dressing system that combines the grasping point prediction method, with a grasping and manipulation strategy which takes grasping orientation computation and robot-garment collision avoidance into account. The experimental results demonstrate that our method is capable of yielding accurate grasping point estimations. The proposed dressing system enables the Baxter robot to autonomously grasp a hospital gown hung on a rail, bring it close to the user and successfully dress the upper-body.

  • Journal article
    Zambelli M, Cully A, Demiris Y, 2020,

    Multimodal representation models for prediction and control from partial information

    , Robotics and Autonomous Systems, Vol: 123, ISSN: 0921-8890

    Similar to humans, robots benefit from interacting with their environment through a number of different sensor modalities, such as vision, touch, sound. However, learning from different sensor modalities is difficult, because the learning model must be able to handle diverse types of signals, and learn a coherent representation even when parts of the sensor inputs are missing. In this paper, a multimodal variational autoencoder is proposed to enable an iCub humanoid robot to learn representations of its sensorimotor capabilities from different sensor modalities. The proposed model is able to (1) reconstruct missing sensory modalities, (2) predict the sensorimotor state of self and the visual trajectories of other agents actions, and (3) control the agent to imitate an observed visual trajectory. Also, the proposed multimodal variational autoencoder can capture the kinematic redundancy of the robot motion through the learned probability distribution. Training multimodal models is not trivial due to the combinatorial complexity given by the possibility of missing modalities. We propose a strategy to train multimodal models, which successfully achieves improved performance of different reconstruction models. Finally, extensive experiments have been carried out using an iCub humanoid robot, showing high performance in multiple reconstruction, prediction and imitation tasks.

  • Conference paper
    Buizza C, Fischer T, Demiris Y, 2020,

    Real-time multi-person pose tracking using data assimilation

    , IEEE Winter Conference on Applications of Computer Vision, Publisher: IEEE

    We propose a framework for the integration of data assimilation and machine learning methods in human pose estimation, with the aim of enabling any pose estimation method to be run in real-time, whilst also increasing consistency and accuracy. Data assimilation and machine learning are complementary methods: the former allows us to make use of information about the underlying dynamics of a system but lacks the flexibility of a data-based model, which we can instead obtain with the latter. Our framework presents a real-time tracking module for any single or multi-person pose estimation system. Specifically, tracking is performed by a number of Kalman filters initiated for each new person appearing in a motion sequence. This permits tracking of multiple skeletons and reduces the frequency that computationally expensive pose estimation has to be run, enabling online pose tracking. The module tracks for N frames while the pose estimates are calculated for frame (N+1). This also results in increased consistency of person identification and reduced inaccuracies due to missing joint locations and inversion of left-and right-side joints.

  • Journal article
    Zhang F, Cully A, Demiris Y, 2019,

    Probabilistic real-time user posture tracking for personalized robot-assisted dressing

    , IEEE Transactions on Robotics, Vol: 35, Pages: 873-888, ISSN: 1552-3098

    Robotic solutions to dressing assistance have the potential to provide tremendous support for elderly and disabled people. However, unexpected user movements may lead to dressing failures or even pose a risk to the user. Tracking such user movements with vision sensors is challenging due to severe visual occlusions created by the robot and clothes. In this paper, we propose a probabilistic tracking method using Bayesian networks in latent spaces, which fuses robot end-effector positions and force information to enable cameraless and real-time estimation of the user postures during dressing. The latent spaces are created before dressing by modeling the user movements with a Gaussian process latent variable model, taking the user’s movement limitations into account. We introduce a robot-assisted dressing system that combines our tracking method with hierarchical multitask control to minimize the force between the user and the robot. The experimental results demonstrate the robustness and accuracy of our tracking method. The proposed method enables the Baxter robot to provide personalized dressing assistance in putting on a sleeveless jacket for users with (simulated) upper-body impairments.

  • Conference paper
    Chacon Quesada R, Demiris Y, 2018,

    Augmented reality control of smart wheelchair using eye-gaze–enabled selection of affordances

    ,, IROS 2018 Workshop on Robots for Assisted Living

    In this paper we present a novel augmented reality head mounted display user interface for controlling a robotic wheelchair for people with limited mobility. To lower the cognitive requirements needed to control the wheelchair, we propose integration of a smart wheelchair with an eye-tracking enabled head-mounted display. We propose a novel platform that integrates multiple user interface interaction methods for aiming at and selecting affordances derived by on-board perception capabilities such as laser-scanner readings and cameras. We demonstrate the effectiveness of the approach by evaluating our platform in two realistic scenarios: 1) Door detection, where the affordance corresponds to a Door object and the Go-Through action and 2) People detection, where the affordance corresponds to a Person and the Approach action. To the best of our knowledge, this is the first demonstration of a augmented reality head-mounted display user interface for controlling a smart wheelchair.

  • Journal article
    Cully AHR, Demiris Y, 2018,

    Quality and diversity optimization: a unifying modular framework

    , IEEE Transactions on Evolutionary Computation, Vol: 22, Pages: 245-259, ISSN: 1941-0026

    The optimization of functions to find the best solution according to one or several objectives has a central role in many engineering and research fields. Recently, a new family of optimization algorithms, named Quality-Diversity optimization, has been introduced, and contrasts with classic algorithms. Instead of searching for a single solution, Quality-Diversity algorithms are searching for a large collection of both diverse and high-performing solutions. The role of this collection is to cover the range of possible solution types as much as possible, and to contain the best solution for each type. The contribution of this paper is threefold. Firstly, we present a unifying framework of Quality-Diversity optimization algorithms that covers the two main algorithms of this family (Multi-dimensional Archive of Phenotypic Elites and the Novelty Search with Local Competition), and that highlights the large variety of variants that can be investigated within this family. Secondly, we propose algorithms with a new selection mechanism for Quality-Diversity algorithms that outperforms all the algorithms tested in this paper. Lastly, we present a new collection management that overcomes the erosion issues observed when using unstructured collections. These three contributions are supported by extensive experimental comparisons of Quality-Diversity algorithms on three different experimental scenarios.

  • Conference paper
    Zhang F, Cully A, Demiris YIANNIS, 2017,

    Personalized Robot-assisted Dressing using User Modeling in Latent Spaces

    , 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, ISSN: 2153-0866

    Robots have the potential to provide tremendous support to disabled and elderly people in their everyday tasks, such as dressing. Many recent studies on robotic dressing assistance usually view dressing as a trajectory planning problem. However, the user movements during the dressing process are rarely taken into account, which often leads to the failures of the planned trajectory and may put the user at risk. The main difficulty of taking user movements into account is caused by severe occlusions created by the robot, the user, and the clothes during the dressing process, which prevent vision sensors from accurately detecting the postures of the user in real time. In this paper, we address this problem by introducing an approach that allows the robot to automatically adapt its motion according to the force applied on the robot's gripper caused by user movements. There are two main contributions introduced in this paper: 1) the use of a hierarchical multi-task control strategy to automatically adapt the robot motion and minimize the force applied between the user and the robot caused by user movements; 2) the online update of the dressing trajectory based on the user movement limitations modeled with the Gaussian Process Latent Variable Model in a latent space, and the density information extracted from such latent space. The combination of these two contributions leads to a personalized dressing assistance that can cope with unpredicted user movements during the dressing while constantly minimizing the force that the robot may apply on the user. The experimental results demonstrate that the proposed method allows the Baxter humanoid robot to provide personalized dressing assistance for human users with simulated upper-body impairments.

  • Conference paper
    Zambelli M, Fischer T, Petit M, Chang HJ, Cully A, Demiris Yet al., 2016,

    Towards Anchoring Self-Learned Representations to Those of Other Agents

    , Workshop on Bio-inspired Social Robot Learning in Home Scenarios IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: Institute of Electrical and Electronics Engineers (IEEE)

    In the future, robots will support humans in their every day activities. One particular challenge that robots will face is understanding and reasoning about the actions of other agents in order to cooperate effectively with humans. We propose to tackle this using a developmental framework, where the robot incrementally acquires knowledge, and in particular 1) self-learns a mapping between motor commands and sensory consequences, 2) rapidly acquires primitives and complex actions by verbal descriptions and instructions from a human partner, 3) discoverscorrespondences between the robots body and other articulated objects and agents, and 4) employs these correspondences to transfer the knowledge acquired from the robots point of view to the viewpoint of the other agent. We show that our approach requires very little a-priori knowledge to achieve imitation learning, to find correspondent body parts of humans, and allows taking the perspective of another agent. This represents a step towards the emergence of a mirror neuron like system based on self-learned representations.

  • Journal article
    Lee K, Ognibene D, Chang H, Kim T-K, Demiris Yet al., 2015,

    STARE: Spatio-Temporal Attention Relocation for multiple structured activities detection

    , IEEE Transactions on Image Processing, Vol: 24, Pages: 5916-5927, ISSN: 1057-7149

    We present a spatio-temporal attention relocation (STARE) method, an information-theoretic approach for efficient detection of simultaneously occurring structured activities. Given multiple human activities in a scene, our method dynamically focuses on the currently most informative activity. Each activity can be detected without complete observation, as the structure of sequential actions plays an important role on making the system robust to unattended observations. For such systems, the ability to decide where and when to focus is crucial to achieving high detection performances under resource bounded condition. Our main contributions can be summarized as follows: 1) information-theoretic dynamic attention relocation framework that allows the detection of multiple activities efficiently by exploiting the activity structure information and 2) a new high-resolution data set of temporally-structured concurrent activities. Our experiments on applications show that the STARE method performs efficiently while maintaining a reasonable level of accuracy.

  • Conference paper
    Kormushev P, Demiris Y, Caldwell DG, 2015,

    Kinematic-free Position Control of a 2-DOF Planar Robot Arm

  • Conference paper
    Kormushev P, Demiris Y, Caldwell DG, 2015,

    Encoderless Position Control of a Two-Link Robot Manipulator

  • Journal article
    Lee K, Su Y, Kim T-K, Demiris Y, Lee K, Su Y, Kim T-K, Demiris Yet al., 2013,

    A syntactic approach to robot imitation learning using probabilistic activity grammars

    , Robotics and Autonomous Systems, Vol: 61, Pages: 1323-1334, ISSN: 0921-8890

    This paper describes a syntactic approach to imitation learning that captures important task structures in the form of probabilistic activity grammars from a reasonably small number of samples under noisy conditions. We show that these learned grammars can be recursively applied to help recognize unforeseen, more complicated tasks that share underlying structures. The grammars enforce an observation to be consistent with the previously observed behaviors which can correct unexpected, out-of-context actions due to errors of the observer and/or demonstrator. To achieve this goal, our method (1) actively searches for frequently occurring action symbols that are subsets of input samples to uncover the hierarchical structure of the demonstration, and (2) considers the uncertainties of input symbols due to imperfect low-level detectors.We evaluate the proposed method using both synthetic data and two sets of real-world humanoid robot experiments. In our Towers of Hanoi experiment, the robot learns the important constraints of the puzzle after observing demonstrators solving it. In our Dance Imitation experiment, the robot learns 3 types of dances from human demonstrations. The results suggest that under reasonable amount of noise, our method is capable of capturing the reusable task structures and generalizing them to cope with recursions.

  • Conference paper
    Lee K, Kim TK, Demiris Y, 2012,

    Learning Action Symbols for Hierarchical Grammar Induction

    , Tsukuba, Japan, International Conference on Pattern Recognition (ICPR), Publisher: IEEE, Pages: 3778-3782

    We present an unsupervised method of learning action symbols from video data, which self-tunes the number of symbols to effectively build hierarchical activity grammars. A video stream is given as a sequence of unlabeled segments. Similar segments are incrementally grouped to form a hierarchical tree structure. The tree is cut into clusters where each cluster is used to train an action symbol. Our goal is to find a good set of clusters i.e. symbols where regularities are best captured in the learned representation, i.e. induced grammar. Our method has two-folds: 1) Create a candidate set of symbols from initial clusters, 2) Build an activity grammar and measure model complexity and likelihood to assess the quality of the candidate set of symbols. We propose a balanced model comparison method which avoids the problem commonly found in model complexity computations where one measurement term dominates the other. Our experiments on the towers of Hanoi and human dancing videos show that our method can discover the optimal number of action symbols effectively.

  • Conference paper
    Lee K, Kim TK, Demiris Y, 2012,

    Learning Reusable Task Components using Hierarchical Activity Grammars with Uncertainties

    , St. Paul, Minnesota, USA, Publisher: IEEE, Pages: 1994-1999

    We present a novel learning method using activity grammars capable of learning reusable task components from a reasonably small number of samples under noisy conditions. Our linguistic approach aims to extract the hierarchical structure of activities which can be recursively applied to help recognize unforeseen, more complicated tasks that share the same underlying structures. To achieve this goal, our method 1) actively searches for frequently occurring action symbols that are subset of input samples to effectively discover the hierarchy, and 2) explicitly takes into account the uncertainty values associated with input symbols due to the noise inherent in low-level detectors. In addition to experimenting with a synthetic dataset to systematically analyze the algorithm's performance, we apply our method in human-led imitation learning environment where a robot learns reusable components of the task from short demonstrations to correctly imitate more complicated, longer demonstrations of the same task category. The results suggest that under reasonable amount of noise, our method is capable to capture the reusable structures of tasks and generalize to cope with recursions.

