Imperial College London

Professor Aldo Faisal

Faculty of EngineeringDepartment of Bioengineering

Professor of AI & Neuroscience
 
 
 
//

Contact

 

+44 (0)20 7594 6373a.faisal Website

 
 
//

Assistant

 

Miss Teresa Ng +44 (0)20 7594 8300

 
//

Location

 

4.08Royal School of MinesSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@unpublished{Festor:2022,
author = {Festor, P and Shafti, A and Harston, A and Li, M and Orlov, P and Faisal, AA},
publisher = {arXiv},
title = {MIDAS: Deep learning human action intention prediction from natural eye movement patterns},
url = {http://arxiv.org/abs/2201.09135v1},
year = {2022}
}

RIS format (EndNote, RefMan)

TY  - UNPB
AB - Eye movements have long been studied as a window into the attentionalmechanisms of the human brain and made accessible as novelty stylehuman-machine interfaces. However, not everything that we gaze upon, issomething we want to interact with; this is known as the Midas Touch problemfor gaze interfaces. To overcome the Midas Touch problem, present interfacestend not to rely on natural gaze cues, but rather use dwell time or gazegestures. Here we present an entirely data-driven approach to decode humanintention for object manipulation tasks based solely on natural gaze cues. Werun data collection experiments where 16 participants are given manipulationand inspection tasks to be performed on various objects on a table in front ofthem. The subjects' eye movements are recorded using wearable eye-trackersallowing the participants to freely move their head and gaze upon the scene. Weuse our Semantic Fovea, a convolutional neural network model to obtain theobjects in the scene and their relation to gaze traces at every frame. We thenevaluate the data and examine several ways to model the classification task forintention prediction. Our evaluation shows that intention prediction is not anaive result of the data, but rather relies on non-linear temporal processingof gaze cues. We model the task as a time series classification problem anddesign a bidirectional Long-Short-Term-Memory (LSTM) network architecture todecode intentions. Our results show that we can decode human intention ofmotion purely from natural gaze cues and object relative position, with$91.9\%$ accuracy. Our work demonstrates the feasibility of natural gaze as aZero-UI interface for human-machine interaction, i.e., users will only need toact naturally, and do not need to interact with the interface itself or deviatefrom their natural eye movement patterns.
AU - Festor,P
AU - Shafti,A
AU - Harston,A
AU - Li,M
AU - Orlov,P
AU - Faisal,AA
PB - arXiv
PY - 2022///
TI - MIDAS: Deep learning human action intention prediction from natural eye movement patterns
UR - http://arxiv.org/abs/2201.09135v1
UR - http://hdl.handle.net/10044/1/94917
ER -