241 results found
Nunes UM, Demiris Y, 2022, Robust Event-Based Vision Model Estimation by Dispersion Minimisation, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Vol: 44, Pages: 9561-9573, ISSN: 0162-8828
Amadori PV, Fischer T, Wang R, et al., 2022, Predicting secondary task performance: a directly actionable metric for cognitive overload detection, IEEE Transactions on Cognitive and Developmental Systems, Vol: 14, Pages: 1474-1485, ISSN: 2379-8920
In this paper, we address cognitive overload detection from unobtrusive physiological signals for users in dual-tasking scenarios. Anticipating cognitive overload is a pivotal challenge in interactive cognitive systems and could lead to safer shared-control between users and assistance systems. Our framework builds on the assumption that decision mistakes on the cognitive secondary task of dual-tasking users correspond to cognitive overload events, wherein the cognitive resources required to perform the task exceed the ones available to the users. We propose DecNet, an end-to-end sequence-to-sequence deep learning model that infers in real-time the likelihood of user mistakes on the secondary task, i.e., the practical impact of cognitive overload, from eye-gaze and head-pose data. We train and test DecNet on a dataset collected in a simulated driving setup from a cohort of 20 users on two dual-tasking decision-making scenarios, with either visual or auditory decision stimuli. DecNet anticipates cognitive overload events in both scenarios and can perform in time-constrained scenarios, anticipating cognitive overload events up to 2s before they occur. We show that DecNet’s performance gap between audio and visual scenarios is consistent with user perceived difficulty. This suggests that single modality stimulation induces higher cognitive load on users, hindering their decision-making abilities.
Zhang X, Angeloudis P, Demiris Y, 2022, ST CrossingPose: a spatial-temporal graph convolutional network for skeleton-based pedestrian crossing intention prediction, IEEE Transactions on Intelligent Transportation Systems, Vol: 23, Pages: 20773-20782, ISSN: 1524-9050
Pedestrian crossing intention prediction is crucial for the safety of pedestrians in the context of both autonomous and conventional vehicles and has attracted widespread interest recently. Various methods have been proposed to perform pedestrian crossing intention prediction, among which the skeleton-based methods have been very popular in recent years. However, most existing studies utilize manually designed features to handle skeleton data, limiting the performance of these methods. To solve this issue, we propose to predict pedestrian crossing intention based on spatial-temporal graph convolutional networks using skeleton data (ST CrossingPose). The proposed method can learn both spatial and temporal patterns from skeleton data, thus having a good feature representation ability. Extensive experiments on a public dataset demonstrate that the proposed method achieves very competitive performance in predicting crossing intention while maintaining a fast inference speed. We also analyze the effect of several factors, e.g., size of pedestrians, time to event, and occlusion, on the proposed method.
To facilitate the monitoring and management of modern transportation systems, monocular visual traffic surveillance systems have been widely adopted for speed measurement, accident detection, and accident prediction. Thanks to the recent innovations in computer vision and deep learning research, the performance of visual traffic surveillance systems has been significantly improved. However, despite this success, there is a lack of survey papers that systematically review these new methods. Therefore, we conduct a systematic review of relevant studies to fill this gap and provide guidance to future studies. This paper is structured along the visual information processing pipeline that includes object detection, object tracking, and camera calibration. Moreover, we also include important applications of visual traffic surveillance systems, such as speed measurement, behavior learning, accident detection and prediction. Finally, future research directions of visual traffic surveillance systems are outlined.
Chacon Quesada R, Demiris Y, 2022, Holo-SpoK: Affordance-aware augmented reality control of legged manipulators, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 856-862
Although there is extensive research regarding legged manipulators, comparatively little focuses on their User Interfaces (UIs). Towards extending the state-of-art in this domain, in this work, we integrate a Boston Dynamics(BD) Spot® with a light-weight 7 DoF Kinova® robot arm and a Robotiq® 2F-85 gripper into a legged manipulator. Furthermore, we jointly control the robotic platform using an affordance-aware Augmented Reality (AR) Head-Mounted Display (HMD) UI developed for the Microsoft HoloLens 2. We named the combined platform Holo-SpoK. Moreover, we explain how this manipulator colocalises with the HoloLens 2 for its control through AR. In addition, we present the details of our algorithms for autonomously detecting grasp-ability affordances and for the refinement of the positions obtainedvia vision-based colocalisation. We validate the suitability of our proposed methods with multiple navigation and manipulation experiments. To the best of our knowledge, this is the first demonstration of an AR HMD UI for controlling legged manipulators.
Zolotas M, Demiris Y, 2022, Disentangled sequence clustering for human intention inference, IEEE/RSJ International Conference on Intelligent Robots and Systems, Publisher: IEEE, ISSN: 2153-0866
Equipping robots with the ability to infer human intent is a vital precondition for effective collaboration. Most computational approaches towards this objective derive a probability distribution of “intent” conditioned on the robot’s perceived state. However, these approaches typically assumetask-specific labels of human intent are known a priori. To overcome this constraint, we propose the Disentangled Sequence Clustering Variational Autoencoder (DiSCVAE), a clustering framework capable of learning such a distribution of intent in an unsupervised manner. The proposed framework leverages recent advances in unsupervised learning to disentangle latentrepresentations of sequence data, separating time-varying local features from time-invariant global attributes. As a novel extension, the DiSCVAE also infers a discrete variable to form a latent mixture model and thus enable clustering over these global sequence concepts, e.g. high-level intentions. We evaluate the DiSCVAE on a real-world human-robot interaction datasetcollected using a robotic wheelchair. Our findings reveal that the inferred discrete variable coincides with human intent, holding promise for collaborative settings, such as shared control.
Al-Hindawi A, Vizcaychipi M, Demiris Y, 2022, Faster, Better Blink Detection through Curriculum Learning by Augmentation
Blinking is a useful biological signal that can gate gaze regression models to avoid the use of incorrect data in downstream tasks. Existing datasets are imbalanced both in frequency of class but also in intra-class difficulty which we demonstrate is a barrier for curriculum learning. We thus propose a novel curriculum augmentation scheme that aims to address frequency and difficulty imbalances implicitly which are are terming Curriculum Learning by Augmentation (CLbA). Using Curriculum Learning by Augmentation (CLbA), we achieve a state-of-The-Art performance of mean Average Precision (mAP) 0.971 using ResNet-18 up from the previous state-of-The-Art of mean Average Precision (mAP) of 0.757 using DenseNet-121 whilst outcompeting Curriculum Learning by Bootstrapping (CLbB) by a significant margin with improved calibration. This new training scheme thus allows the use of smaller and more performant Convolutional Neural Network (CNN) backbones fulfilling Nyquist criteria to achieve a sampling frequency of 102.3Hz. This paves the way for inference of blinking in real-Time applications.
Bin Razali MH, Demiris Y, 2022, Using a single input to forecast human action keystates in everyday pick and place actions, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 3488-3492
We define action keystates as the start or end of an actionthat contains information such as the human pose and time.Existing methods that forecast the human pose use recurrentnetworks that input and output a sequence of poses. In this pa-per, we present a method tailored for everyday pick and placeactions where the object of interest is known. In contrast toexisting methods, ours uses an input from a single timestep todirectly forecast (i) the key pose the instant the pick or placeaction is performed and (ii) the time it takes to get to the pre-dicted key pose. Experimental results show that our methodoutperforms the state-of-the-art for key pose forecasting andis comparable for time forecasting while running at least anorder of magnitude faster. Further ablative studies reveal thesignificance of the object of interest in enabling the total num-ber of parameters across all existing methods to be reduced byat least 90% without any degradation in performance.
Zhang F, Demiris Y, 2022, Learning garment manipulation policies toward robot-assisted dressing., Science Robotics, Vol: 7, Pages: eabm6010-eabm6010, ISSN: 2470-9476
Assistive robots have the potential to support people with disabilities in a variety of activities of daily living, such as dressing. People who have completely lost their upper limb movement functionality may benefit from robot-assisted dressing, which involves complex deformable garment manipulation. Here, we report a dressing pipeline intended for these people and experimentally validate it on a medical training manikin. The pipeline is composed of the robot grasping a hospital gown hung on a rail, fully unfolding the gown, navigating around a bed, and lifting up the user's arms in sequence to finally dress the user. To automate this pipeline, we address two fundamental challenges: first, learning manipulation policies to bring the garment from an uncertain state into a configuration that facilitates robust dressing; second, transferring the deformable object manipulation policies learned in simulation to real world to leverage cost-effective data generation. We tackle the first challenge by proposing an active pre-grasp manipulation approach that learns to isolate the garment grasping area before grasping. The approach combines prehensile and nonprehensile actions and thus alleviates grasping-only behavioral uncertainties. For the second challenge, we bridge the sim-to-real gap of deformable object policy transfer by approximating the simulator to real-world garment physics. A contrastive neural network is introduced to compare pairs of real and simulated garment observations, measure their physical similarity, and account for simulator parameters inaccuracies. The proposed method enables a dual-arm robot to put back-opening hospital gowns onto a medical manikin with a success rate of more than 90%.
Taniguchi T, Nagai T, Shimoda S, et al., 2022, Special issue on symbol emergence in robotics and cognitive systems (II), ADVANCED ROBOTICS, Vol: 36, Pages: 217-218, ISSN: 0169-1864
Kaptein F, Kiefer B, Cully A, et al., 2022, A cloud-based robot system for long-term interaction: principles, implementation, lessons learned, ACM Transactions on Human-Robot Interaction, Vol: 11, ISSN: 2573-9522
Making the transition to long-term interaction with social-robot systems has been identified as one of the main challenges in human-robot interaction. This article identifies four design principles to address this challenge and applies them in a real-world implementation: cloud-based robot control, a modular design, one common knowledge base for all applications, and hybrid artificial intelligence for decision making and reasoning. The control architecture for this robot includes a common Knowledge-base (ontologies), Data-base, “Hybrid Artificial Brain” (dialogue manager, action selection and explainable AI), Activities Centre (Timeline, Quiz, Break and Sort, Memory, Tip of the Day, ), Embodied Conversational Agent (ECA, i.e., robot and avatar), and Dashboards (for authoring and monitoring the interaction). Further, the ECA is integrated with an expandable set of (mobile) health applications. The resulting system is a Personal Assistant for a healthy Lifestyle (PAL), which supports diabetic children with self-management and educates them on health-related issues (48 children, aged 6–14, recruited via hospitals in the Netherlands and in Italy). It is capable of autonomous interaction “in the wild” for prolonged periods of time without the need for a “Wizard-of-Oz” (up until 6 months online). PAL is an exemplary system that provides personalised, stable and diverse, long-term human-robot interaction.
Jang Y, Demiris Y, 2022, Message passing framework for vision prediction stability in human robot interaction, IEEE International Conference on Robotics and Automation 2022, Publisher: IEEE, ISSN: 2152-4092
In Human Robot Interaction (HRI) scenarios, robot systems would benefit from an understanding of the user's state, actions and their effects on the environments to enable better interactions. While there are specialised vision algorithms for different perceptual channels, such as objects, scenes, human pose, and human actions, it is worth considering how their interaction can help improve each other's output. In computer vision, individual prediction modules for these perceptual channels frequently produce noisy outputs due to the limited datasets used for training and the compartmentalisation of the perceptual channels, often resulting in noisy or unstable prediction outcomes. To stabilise vision prediction results in HRI, this paper presents a novel message passing framework that uses the memory of individual modules to correct each other's outputs. The proposed framework is designed utilising common-sense rules of physics (such as the law of gravity) to reduce noise while introducing a pipeline that helps to effectively improve the output of each other's modules. The proposed framework aims to analyse primitive human activities such as grasping an object in a video captured from the perspective of a robot. Experimental results show that the proposed framework significantly reduces the output noise of individual modules compared to the case of running independently. This pipeline can be used to measure human reactions when interacting with a robot in various HRI scenarios.
Bin Razali MH, Demiris Y, 2022, Using eye-gaze to forecast human pose in everyday pick and place actions, IEEE International Conference on Robotics and Automation
Collaborative robots that operate alongside hu-mans require the ability to understand their intent and forecasttheir pose. Among the various indicators of intent, the eyegaze is particularly important as it signals action towards thegazed object. By observing a person’s gaze, one can effectivelypredict the object of interest and subsequently, forecast theperson’s pose. We leverage this and present a method thatforecasts the human pose using gaze information for everydaypick and place actions in a home environment. Our method firstattends to fixations to locate the coordinates of the object ofinterest before inputting said coordinates to a pose forecastingnetwork. Experiments on the MoGaze dataset show that ourgaze network lowers the errors of existing pose forecastingmethods and that incorporating prior in the form of textualinstructions further lowers the errors by a significant amount.Furthermore, the use of eye gaze now allows a simple multilayerperceptron network to directly forecast the keypose.
Quesada RC, Demiris Y, 2022, Proactive robot assistance: affordance-aware augmented reality user interfaces, IEEE Robotics and Automation magazine, Vol: 29, ISSN: 1070-9932
Assistive robots have the potential to increase the autonomy and quality of life of people with disabilities  . Their applications include rehabilitation robots, smart wheelchairs, companion robots, mobile manipulators, and educational robots  . However, designing an intuitive user interface (UI) for the control of assistive robots remains a challenge, as most UIs leverage traditional control interfaces, such as joysticks and keyboards, which might be challenging and even impossible for some users. Augmented reality (AR) UIs introduce more natural interactions between people and assistive robots, potentially reaching a more diverse user base.
Girbes-Juan V, Schettino V, Gracia L, et al., 2022, Combining haptics and inertial motion capture to enhance remote control of a dual-arm robot, JOURNAL ON MULTIMODAL USER INTERFACES, Vol: 16, Pages: 219-238, ISSN: 1783-7677
Taniguchi T, Nagai T, Shimoda S, et al., 2022, Special issue on Symbol Emergence in Robotics and Cognitive Systems (I) PREFACE, ADVANCED ROBOTICS, Vol: 36, Pages: 1-2, ISSN: 0169-1864
Al-Hindawi A, Vizcaychipi MP, Demiris Y, 2022, WHAT IS THE PATIENT LOOKING AT? ROBUST GAZE-SCENE INTERSECTION UNDER FREE-VIEWING CONDITIONS, Pages: 2430-2434, ISSN: 1520-6149
Locating the user's gaze in the scene, also known as Point of Regard (PoR) estimation, following gaze regression is important for many downstream tasks. Current techniques either require the user to wear and calibrate instruments, require significant pre-processing of the scene information, or place restrictions on user's head movements. We propose a geometrically inspired algorithm that, despite its simplicity, provides high accuracy and O(J) performance under a variety of challenging situations including sparse depth maps, high noise, and high dynamic parallax between the user and the scene camera. We demonstrate the utility of the proposed algorithm in regressing the PoR from scenes captured in the Intensive Care Unit (ICU) at Chelsea & Westminster Hospital NHS Foundation Trust a
Razali H, Demiris Y, 2021, Multitask variational autoencoding of human-to-human object handover, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 7315-7320, ISSN: 2153-0858
Assistive robots that operate alongside humans require the ability to understand and replicate human behaviours during a handover. A handover is defined as a joint action between two participants in which a giver hands an object over to the receiver. In this paper, we present a method for learning human-to-human handovers observed from motion capture data. Given the giver and receiver pose from a single timestep, and the object label in the form of a word embedding, our Multitask Variational Autoencoder jointly forecasts their pose as well as the orientation of the object held by the giver at handover. Our method is in large contrast to existing works for human pose forecasting that employ deep autoregressive models requiring a sequence of inputs. Furthermore, our method is novel in that it learns both the human pose and object orientation in a joint manner. Experimental results on the publicly available Handover Orientation and Motion Capture Dataset show that our proposed method outperforms the autoregressive baselines for handover pose forecasting by approximately 20% while being on-par for object orientation prediction with a runtime that is 5x faster. a
Nunes UM, Demiris Y, 2021, Live demonstration: incremental motion estimation for event-based cameras by dispersion minimisation, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE COMPUTER SOC, Pages: 1322-1323, ISSN: 2160-7508
Live demonstration setup. (Left) The setup consists of a DAVIS346B event camera connected to a standard consumer laptop and undergoes some motion. (Right) The motion estimates are plotted in red and, for rotation-like motions, the angular velocities provided by the camera IMU are also plotted in blue. This plot exemplifies an event camera undergoing large rotational motions (up to ~ 1000 deg/s) around the (a) x-axis, (b) y-axis and (c) z-axis. Overall, the incremental motion estimation method follows the IMU measurements. Optionally, the resultant global optical flow can also be shown, as well as the corresponding generated events by accumulating them onto the image plane (bottom left corner).
Chacon-Quesada R, Demiris Y, 2021, Augmented reality eser interfaces for heterogeneous multirobot control, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Publisher: IEEE, Pages: 11439-11444, ISSN: 2153-0858
Recent advances in the design of head-mounted augmented reality (AR) interfaces for assistive human-robot interaction (HRI) have allowed untrained users to rapidly and fluently control single-robot platforms. In this paper, we investigate how such interfaces transfer onto multirobot architectures, as several assistive robotics applications need to be distributed among robots that are different both physically and in terms of software. As part of this investigation, we introduce a novel head-mounted AR interface for heterogeneous multirobot control. This interface generates and displays dynamic joint-affordance signifiers, i.e. signifiers that combine and show multiple actions from different robots that can be applied simultaneously to an object. We present a user study with 15 participants analysing the effects of our approach on their perceived fluency. Participants were given the task of filling-out a cup with water making use of a multirobot platform. Our results show a clear improvement in standard HRI fluency metrics when users applied dynamic joint-affordance signifiers, as opposed to a sequence of independent actions.
Amadori P, Fischer T, Demiris Y, 2021, HammerDrive: A task-aware driving visual attention model, IEEE Transactions on Intelligent Transportation Systems, Vol: 23, Pages: 5573-5585, ISSN: 1524-9050
We introduce HammerDrive, a novel architecture for task-aware visual attention prediction in driving. The proposed architecture is learnable from data and can reliably infer the current focus of attention of the driver in real-time, while only requiring limited and easy-to-access telemetry data from the vehicle. We build the proposed architecture on two core concepts: 1) driving can be modeled as a collection of sub-tasks (maneuvers), and 2) each sub-task affects the way a driver allocates visual attention resources, i.e., their eye gaze fixation. HammerDrive comprises two networks: a hierarchical monitoring network of forward-inverse model pairs for sub-task recognition and an ensemble network of task-dependent convolutional neural network modules for visual attention modeling. We assess the ability of HammerDrive to infer driver visual attention on data we collected from 20 experienced drivers in a virtual reality-based driving simulator experiment. We evaluate the accuracy of our monitoring network for sub-task recognition and show that it is an effective and light-weight network for reliable real-time tracking of driving maneuvers with above 90% accuracy. Our results show that HammerDrive outperforms a comparable state-of-the-art deep learning model for visual attention prediction on numerous metrics with ~13% improvement for both Kullback-Leibler divergence and similarity, and demonstrate that task-awareness is beneficial for driver visual attention prediction.
Zolotas M, Demiris Y, 2021, Disentangled sequence clustering for human intention inference, Publisher: arXiv
Equipping robots with the ability to infer human intent is a vitalprecondition for effective collaboration. Most computational approaches towardsthis objective employ probabilistic reasoning to recover a distribution of"intent" conditioned on the robot's perceived sensory state. However, theseapproaches typically assume task-specific notions of human intent (e.g.labelled goals) are known a priori. To overcome this constraint, we propose theDisentangled Sequence Clustering Variational Autoencoder (DiSCVAE), aclustering framework that can be used to learn such a distribution of intent inan unsupervised manner. The DiSCVAE leverages recent advances in unsupervisedlearning to derive a disentangled latent representation of sequential data,separating time-varying local features from time-invariant global aspects.Though unlike previous frameworks for disentanglement, the proposed variantalso infers a discrete variable to form a latent mixture model and enableclustering of global sequence concepts, e.g. intentions from observed humanbehaviour. To evaluate the DiSCVAE, we first validate its capacity to discoverclasses from unlabelled sequences using video datasets of bouncing digits and2D animations. We then report results from a real-world human-robot interactionexperiment conducted on a robotic wheelchair. Our findings glean insights intohow the inferred discrete variable coincides with human intent and thus servesto improve assistance in collaborative settings, such as shared control.
Behrens JK, Nazarczuk M, Stepanova K, et al., 2021, Embodied Reasoning for Discovering Object Properties via Manipulation, IEEE International Conference on Robotics and Automation (ICRA), Publisher: IEEE, Pages: 10139-10145, ISSN: 1050-4729
Al-Hindawi A, Vizcaychipi MP, Demiris Y, 2021, Continuous Non-Invasive Eye Tracking In Intensive Care, 43rd Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society (IEEE EMBC), Publisher: IEEE, Pages: 1869-1873, ISSN: 1557-170X
Candela E, Feng Y, Mead D, et al., 2021, Fast Collision Prediction for Autonomous Vehicles using a Stochastic Dynamics Model, IEEE Intelligent Transportation Systems Conference (ITSC), Publisher: IEEE, Pages: 211-216, ISSN: 2153-0009
Girbes-Juan V, Schettino V, Demiris Y, et al., 2021, Haptic and Visual Feedback Assistance for Dual-Arm Robot Teleoperation in Surface Conditioning Tasks, IEEE TRANSACTIONS ON HAPTICS, Vol: 14, Pages: 44-56, ISSN: 1939-1412
Buizza C, Demiris Y, 2021, Rotational Adjoint Methods for Learning-Free 3D Human Pose Estimation from IMU Data, 25th International Conference on Pattern Recognition (ICPR), Publisher: IEEE COMPUTER SOC, Pages: 7868-7875, ISSN: 1051-4651
Amadori PV, Fischer T, Wang R, et al., 2020, Decision anticipation for driving assistance systems, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), Publisher: IEEE, Pages: 1-7
Anticipating the correctness of imminent driver decisions is a crucial challenge in advanced driving assistance systems and has the potential to lead to more reliable and safer human-robot interactions. In this paper, we address the task of decision correctness prediction in a driver-in-the-loop simulated environment using unobtrusive physiological signals, namely, eye gaze and head pose. We introduce a sequence-to-sequence based deep learning model to infer the driver's likelihood of making correct/wrong decisions based on the corresponding cognitive state. We provide extensive experimental studies over multiple baseline classification models on an eye gaze pattern and head pose dataset collected from simulated driving. Our results show strong correlates between the physiological data and decision correctness, and that the proposed sequential model reliably predicts decision correctness from the driver with 80% precision and 72% recall. We also demonstrate that our sequential model performs well in scenarios where early anticipation of correctness is critical, with accurate predictions up to two seconds before a decision is performed.
Fischer T, Demiris Y, 2020, Computational modelling of embodied visual perspective-taking, IEEE Transactions on Cognitive and Developmental Systems, Vol: 12, Pages: 723-732, ISSN: 2379-8920
Humans are inherently social beings that benefit from their perceptional capability to embody another point of view, typically referred to as perspective-taking. Perspective-taking is an essential feature in our daily interactions and is pivotal for human development. However, much remains unknown about the precise mechanisms that underlie perspective-taking. Here we show that formalizing perspective-taking in a computational model can detail the embodied mechanisms employed by humans in perspective-taking. The model's main building block is a set of action primitives that are passed through a forward model. The model employs a process that selects a subset of action primitives to be passed through the forward model to reduce the response time. The model demonstrates results that mimic those captured by human data, including (i) response times differences caused by the angular disparity between the perspective-taker and the other agent, (ii) the impact of task-irrelevant body posture variations in perspective-taking, and (iii) differences in the perspective-taking strategy between individuals. Our results provide support for the hypothesis that perspective-taking is a mental simulation of the physical movements that are required to match another person's visual viewpoint. Furthermore, the model provides several testable predictions, including the prediction that forced early responses lead to an egocentric bias and that a selection process introduces dependencies between two consecutive trials. Our results indicate potential links between perspective-taking and other essential perceptional and cognitive mechanisms, such as active vision and autobiographical memories.
Goncalves Nunes UM, Demiris Y, 2020, Entropy minimisation framework for event-based vision model estimation, 16th European Conference on Computer Vision 2020, Publisher: Springer, Pages: 161-176
We propose a novel Entropy Minimisation (EMin) frame-work for event-based vision model estimation. The framework extendsprevious event-based motion compensation algorithms to handle modelswhose outputs have arbitrary dimensions. The main motivation comesfrom estimating motion from events directly in 3D space (e.g.eventsaugmented with depth), without projecting them onto an image plane.This is achieved by modelling the event alignment according to candidateparameters and minimising the resultant dispersion. We provide a familyof suitable entropy loss functions and an efficient approximation whosecomplexity is only linear with the number of events (e.g.the complexitydoes not depend on the number of image pixels). The framework is eval-uated on several motion estimation problems, including optical flow androtational motion. As proof of concept, we also test our framework on6-DOF estimation by performing the optimisation directly in 3D space.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.