Publications

Panagakis Y, Nicolaou M, Zafeiriou S, Pantic Met al., 2015, Robust correlated and individual component analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 38, Pages: 1665-1678, ISSN: 0162-8828

Recovering correlated and individual components of two, possibly temporally misaligned, sets of data is a fundamental task in disciplines such as image, vision, and behavior computing, with application to problems such as multi-modal fusion (via correlated components), predictive analysis, and clustering (via the individual ones). Here, we study the extraction of correlated and individual components under real-world conditions, namely i) the presence of gross non-Gaussian noise and ii) temporally misaligned data. In this light, we propose a method for the Robust Correlated and Individual Component Analysis (RCICA) of two sets of data in the presence of gross, sparse errors. We furthermore extend RCICA in order to handle temporal incongruities arising in the data. To this end, two suitable optimization problems are solved. The generality of the proposed methods is demonstrated by applying them onto 4 applications, namely i) heterogeneous face recognition, ii) multi-modal feature fusion for human behavior analysis (i.e., audio-visual prediction of interest and conflict), iii) face clustering, and iv) the temporal alignment of facial expressions. Experimental results on 2 synthetic and 7 real world datasets indicate the robustness and effectiveness of the proposed methods on these application domains, outperforming other state-of-the-art methods in the field.

Journal article

Antonakos E, Alabort-I-Medina J, Zafeiriou S, 2015, Active pictorial structures, CVPR 2015, Publisher: IEEE, Pages: 5435-5444, ISSN: 1063-6919

In this paper we present a novel generative deformable model motivated by Pictorial Structures (PS) and Active Appearance Models (AAMs) for object alignment in-the-wild. Inspired by the tree structure used in PS, the proposed Active Pictorial Structures (APS)1 model the appearance of the object using multiple graph-based pairwise normal distributions (Gaussian Markov Random Field) between the patches extracted from the regions around adjacent landmarks. We show that this formulation is more accurate than using a single multivariate distribution (Principal Component Analysis) as commonly done in the literature. APS employ a weighted inverse compositional Gauss-Newton optimization with fixed Jacobian and Hessian that achieves close to real-time performance and state-of-the-art results. Finally, APS have a spring-like graph-based deformation prior term that makes them robust to bad initializations. We present extensive experiments on the task of face alignment, showing that APS outperform current state-of-the-art methods. To the best of our knowledge, the proposed method is the first weighted inverse compositional technique that proves to be so accurate and efficient at the same time.

Conference paper

Snape P, Panagakis Y, Zafeiriou S, 2015, Automatic construction of robust spherical harmonic subspaces, CVPR 2015, Publisher: IEEE, Pages: 91-100, ISSN: 1063-6919

In this paper we propose a method to automatically recover a class specific low dimensional spherical harmonic basis from a set of in-the-wild facial images. We combine existing techniques for uncalibrated photometric stereo and low rank matrix decompositions in order to robustly recover a combined model of shape and identity. We build this basis without aid from a 3D model and show how it can be combined with recent efficient sparse facial feature localisation techniques to recover dense 3D facial shape. Unlike previous works in the area, our method is very efficient and is an order of magnitude faster to train, taking only a few minutes to build a model with over 2000 images. Furthermore, it can be used for real-time recovery of facial shape.

Conference paper

Cheng S, Marras I, Zafeiriou S, Pantic Met al., 2015, Active nonrigid ICP algorithm, 2015 11th International Conference on Automatic Face and Gesture Recognition, Publisher: IEEE

The problem of fitting a 3D facial model to a 3D mesh has received a lot of attention the past 15-20 years. The majority of the techniques fit a general model consisting of a simple parameterisable surface or a mean 3D facial shape. The drawback of this approach is that is rather difficult to describe the non-rigid aspect of the face using just a single facial model. One way to capture the 3D facial deformations is by means of a statistical 3D model of the face or its parts. This is particularly evident when we want to capture the deformations of the mouth region. Even though statistical models of face are generally applied for modelling facial intensity, there are few approaches that fit a statistical model of 3D faces. In this paper, in order to capture and describe the non-rigid nature of facial surfaces we build a part-based statistical model of the 3D facial surface and we combine it with non-rigid iterative closest point algorithms. We show that the proposed algorithm largely outperforms state-of-the-art algorithms for 3D face fitting and alignment especially when it comes to the description of the mouth region.

Conference paper

Antonakos E, Roussos A, Zafeiriou S, 2015, A survey on mouth modeling and analysis for Sign Language recognition, FG 2015, Publisher: IEEE, Pages: 1-7

Around 70 million Deaf worldwide use Sign Languages (SLs) as their native languages. At the same time, they have limited reading/writing skills in the spoken language. This puts them at a severe disadvantage in many contexts, including education, work, usage of computers and the Internet. Automatic Sign Language Recognition (ASLR) can support the Deaf in many ways, e.g. by enabling the development of systems for Human-Computer Interaction in SL and translation between sign and spoken language. Research in ASLR usually revolves around automatic understanding of manual signs. Recently, ASLR research community has started to appreciate the importance of non-manuals, since they are related to the lexical meaning of a sign, the syntax and the prosody. Nonmanuals include body and head pose, movement of the eyebrows and the eyes, as well as blinks and squints. Arguably, the mouth is one of the most involved parts of the face in non-manuals. Mouth actions related to ASLR can be either mouthings, i.e. visual syllables with the mouth while signing, or non-verbal mouth gestures. Both are very important in ASLR. In this paper, we present the first survey on mouth non-manuals in ASLR. We start by showing why mouth motion is important in SL and the relevant techniques that exist within ASLR. Since limited research has been conducted regarding automatic analysis of mouth motion in the context of ALSR, we proceed by surveying relevant techniques from the areas of automatic mouth expression and visual speech recognition which can be applied to the task. Finally, we conclude by presenting the challenges and potentials of automatic analysis of mouth motion in the context of ASLR.

Conference paper

Zafeiriou L, Nicolaou MA, Zafeiriou S, Nikitidis S, Pantic Met al., 2015, Probabilistic slow features for behavior analysis, IEEE Transactions on Neural Networks and Learning Systems, Vol: 27, Pages: 1034-1048, ISSN: 2162-2388

A recently introduced latent feature learning technique for time-varying dynamic phenomena analysis is the so-called slow feature analysis (SFA). SFA is a deterministic component analysis technique for multidimensional sequences that, by minimizing the variance of the first-order time derivative approximation of the latent variables, finds uncorrelated projections that extract slowly varying features ordered by their temporal consistency and constancy. In this paper, we propose a number of extensions in both the deterministic and the probabilistic SFA optimization frameworks. In particular, we derive a novel deterministic SFA algorithm that is able to identify linear projections that extract the common slowest varying features of two or more sequences. In addition, we propose an expectation maximization (EM) algorithm to perform inference in a probabilistic formulation of SFA and similarly extend it in order to handle two and more time-varying data sequences. Moreover, we demonstrate that the probabilistic SFA (EM-SFA) algorithm that discovers the common slowest varying latent space of multiple sequences can be combined with dynamic time warping techniques for robust sequence time-alignment. The proposed SFA algorithms were applied for facial behavior analysis, demonstrating their usefulness and appropriateness for this task.

Journal article

Asthana A, Zafeiriou S, Tzimiropoulos G, Cheng S, Pantic Met al., 2015, From Pixels to Response Maps: Discriminative Image Filtering for Face Alignment in the Wild, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Vol: 37, Pages: 1312-1320, ISSN: 0162-8828

Journal article

Antonakos E, Alabort-I-Medina J, Tzimiropoulos G, Zafeiriou SPet al., 2015, Feature-based lucas-kanade and active appearance models., IEEE Transactions on Image Processing, Vol: 24, Pages: 2617-2632, ISSN: 1057-7149

Lucas-Kanade and active appearance models are among the most commonly used methods for image alignment and facial fitting, respectively. They both utilize nonlinear gradient descent, which is usually applied on intensity values. In this paper, we propose the employment of highly descriptive, densely sampled image features for both problems. We show that the strategy of warping the multichannel dense feature image at each iteration is more beneficial than extracting features after warping the intensity image at each iteration. Motivated by this observation, we demonstrate robust and accurate alignment and fitting performance using a variety of powerful feature descriptors. Especially with the employment of histograms of oriented gradient and scale-invariant feature transform features, our method significantly outperforms the current state-of-the-art results on in-the-wild databases.

Journal article

Liwicki S, Zafeiriou S, Pantic M, 2015, Online Kernel Slow Feature Analysis for Temporal Video Segmentation and Tracking, IEEE Transactions on Image Processing, Vol: 24, Pages: 2955-2970, ISSN: 1057-7149

Slow feature analysis (SFA) is a dimensionality reduction technique which has been linked to how visual brain cells work. In recent years, the SFA was adopted for computer vision tasks. In this paper, we propose an exact kernel SFA (KSFA) framework for positive definite and indefinite kernels in Krein space. We then formulate an online KSFA which employs a reduced set expansion. Finally, by utilizing a special kind of kernel family, we formulate exact online KSFA for which no reduced set is required. We apply the proposed system to develop a SFA-based change detection algorithm for stream data. This framework is employed for temporal video segmentation and tracking. We test our setup on synthetic and real data streams. When combined with an online learning tracking system, the proposed change detection approach improves upon tracking setups that do not utilize change detection.

Journal article

Zafeiriou S, Zhang C, Zhang Z, 2015, A survey on face detection in the wild: past, present and future, Computer Vision and Image Understanding, Vol: 138, Pages: 1-24, ISSN: 1090-235X

Face detection is one of the most studied topics in computer vision literature, not only because of the challenging nature of face as an object, but also due to the countless applications that require the application of face detection as a first step. During the past 15 years, tremendous progress has been made due to the availability of data in unconstrained capture conditions (so-called ‘in-the-wild’) through the Internet, the effort made by the community to develop publicly available benchmarks, as well as the progress in the development of robust computer vision algorithms. In this paper, we survey the recent advances in real-world face detection techniques, beginning with the seminal Viola–Jones face detector methodology. These techniques are roughly categorized into two general schemes: rigid templates, learned mainly via boosting based methods or by the application of deep neural networks, and deformable models that describe the face by its parts. Representative methods will be described in detail, along with a few additional successful methods that we briefly go through at the end. Finally, we survey the main databases used for the evaluation of face detection algorithms and recent benchmarking efforts, and discuss the future of face detection.

Journal article

Trigeorgis G, Coutinho E, Ringeval F, Marchi E, Zafeiriou S, Schuller Bet al., 2015, The ICL-TUM-PASSAU approach for the MediaEval 2015 "affective impact of movies" task, ISSN: 1613-0073

In this paper we describe the Imperial College London, Technische Universitat München and University of Passau (ICL+TUM+PASSAU) team approach to the MediaEval’s "Affective Impact of Movies" challenge, which consists in the automatic detection of affective (arousal and valence) and violent content in movie excerpts. In addition to the baseline features, we computed spectral and energy related acoustic features, and the probability of various objects being present in the video. Random Forests, AdaBoost and Support Vector Machines were used as classification methods. Best results show that the dataset is highly challenging for both affect and violence detection tasks, mainly because of issues in inter-rater agreement and data scarcity.

Abstract
Cite

Conference paper

Coutinho E, Trigeorgis G, Zafeiriou S, Schuller Bet al., 2015, Automatically estimating emotion in music with deep long-short term memory recurrent neural networks, ISSN: 1613-0073

In this paper we describe our approach for the MediaEval’s "Emotion in Music" task. Our method consists of deep Long-Short Term Memory Recurrent Neural Networks (LSTM-RNN) for dynamic Arousal and Valence regression, using acoustic and psychoacoustic features extracted from the songs that have been previously proven as effective for emotion prediction in music. Results on the challenge test demonstrate an excellent performance for Arousal estimation (r = 0.613 ± 0.278), but not for Valence (r = 0.026 ± 0.500). Issues regarding the quality of the test set annotations’ reliability and distributions are indicated as plausible justifications for these results. By using a subset of the development set that was left out for performance estimation, we could determine that the performance of our approach may be underestimated for Valence (Arousal: r = 0.596 ± 0.386; Valence: r = 0.458 ± 0.551).

Abstract
Cite

Conference paper

Cheng S, Marras I, Zafeiriou S, Pantic Met al., 2015, Active Nonrigid ICP Algorithm, IEEE 11th International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Publisher: IEEE, ISSN: 2326-5396

Conference paper

Bousmalis K, Zafeiriou S, Morency LP, Pantic M, Ghahramani Zet al., 2015, Variational Infinite Hidden Conditional Random Fields, IEEE transactions on Pattern Analysis and Machine Intelligence, Vol: PP, ISSN: 0162-8828

Hidden Conditional Random Fields (HCRFs) are discriminative latent variable models which have been shown to successfully learn the hidden structure of a given classification problem. An Infinite Hidden Conditional Random Field is a Hidden Conditional Random Field with a countably infinite number of hidden states, which rids us not only of the necessity to specify a priori a fixed number of hidden states available but also of the problem of overfitting. Markov chain Monte Carlo (MCMC) sampling algorithms are often employed for inference in such models. However, convergence of such algorithms is rather difficult to verify, and as the complexity of the task at hand increases the computational cost of such algorithms often becomes prohibitive. These limitations can be overcome by variational techniques. In this paper, we present a generalized framework for infinite HCRF models, and a novel variational inference approach on a model based on coupled Dirichlet Process Mixtures, the HCRF–DPM. We show that the variational HCRF–DPM is able to converge to a correct number of represented hidden states, and performs as well as the best parametric HCRFs —chosen via cross–validation— for the difficult tasks of recognizing instances of agreement, disagreement, and pain in audiovisual sequences.

Journal article

Antonakos E, Roussos A, Zafeiriou S, 2015, A Survey on Mouth Modeling and Analysis for Sign Language Recognition, IEEE 11th International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Publisher: IEEE, ISSN: 2326-5396

Conference paper

Cheng S, Marras I, Zafeiriou S, Pantic Met al., 2015, Active Nonrigid ICP Algorithm, IEEE 11th International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Publisher: IEEE, ISSN: 2326-5396

Conference paper

Cheng S, Marras I, Zafeiriou S, Pantic Met al., 2015, Active Nonrigid ICP Algorithm, IEEE 11th International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Publisher: IEEE, ISSN: 2326-5396

Conference paper

Antonakos E, Roussos A, Zafeiriou S, 2015, A Survey on Mouth Modeling and Analysis for Sign Language Recognition, IEEE 11th International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Publisher: IEEE, ISSN: 2326-5396

Conference paper

Antonakos E, Roussos A, Zafeiriou S, 2015, A Survey on Mouth Modeling and Analysis for Sign Language Recognition, IEEE 11th International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Publisher: IEEE, ISSN: 2326-5396

Conference paper

Cheng S, Marras I, Zafeiriou S, Pantic Met al., 2015, Active Nonrigid ICP Algorithm, 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (IEEE FG), Publisher: IEEE, ISSN: 2326-5396

Conference paper

Antonakos E, Roussos A, Zafeiriou S, 2015, A Survey on Mouth Modeling and Analysis for Sign Language Recognition, 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (IEEE FG), Publisher: IEEE, ISSN: 2326-5396

Conference paper

Alabort-i-Medina J, Qu B, Zafeiriou S, 2015, Statistically Learned Deformable Eye Models, 13th European Conference on Computer Vision (ECCV), Publisher: SPRINGER-VERLAG BERLIN, Pages: 285-295, ISSN: 0302-9743

Author Web Link
Cite
Citations: 2

Conference paper

Panagakis Y, Zafeiriou S, Pantic M, 2015, Audiovisual Conflict Detection in Political Debates, 13th European Conference on Computer Vision (ECCV), Publisher: SPRINGER-VERLAG BERLIN, Pages: 306-314, ISSN: 0302-9743

Conference paper

Alabort-i-Medina J, Zafeiriou S, 2015, Unifying Holistic and Parts-Based Deformable Model Fitting, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 3679-3688, ISSN: 1063-6919

Author Web Link
Cite
Citations: 13

Conference paper

Tzimiropoulos G, Alabort-i-Medina J, Zafeiriou SP, Pantic Met al., 2014, Active Orientation Models for Face Alignment In-the-Wild, IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, Vol: 9, Pages: 2024-2034, ISSN: 1556-6013

Author Web Link
Cite
Citations: 16

Journal article

Zhang L, Tan T, Ross A, Zafeiriou Set al., 2014, Special issue on "Multi-biometrics and Mobile-biometrics: Recent Advances and Future Research", IMAGE AND VISION COMPUTING, Vol: 32, Pages: 1145-1146, ISSN: 0262-8856

Author Web Link
Cite
Citations: 1

Journal article

Alabort-I-Medina J, Antonakos E, Booth J, Snape P, Zafeiriou SPet al., 2014, Menpo: A comprehensive platform for parametric image alignment and visual deformable models, ACM International Conference on Multimedia, Publisher: ACM, Pages: 679-682

The Menpo Project, hosted at http://www.menpo.io, is a BSD-licensed software platform providing a complete and comprehensive solution for annotating, building, fitting and evaluating deformable visual models from image data. Menpo is a powerful and flexible cross-platform framework written in Python that works on Linux, OS X and Windows. Menpo has been designed to allow for easy adaptation of Lucas-Kanade (LK) parametric image alignment techniques, and goes a step further in providing all the necessary tools for building and fitting state-of-the-art deformable models such as Active Appearance Models (AAMs), Constrained Local Models (CLMs) and regression-based methods (such as the Supervised Descent Method (SDM)). These methods are extensively used for facial point localisation although they can be applied to many other deformable objects. Menpo makes it easy to understand and evaluate these complex algorithms, providing tools for visualisation, analysis, and performance assessment. A key challenge in building deformable models is data annotation; Menpo expedites this process by providing a simple web-based annotation tool hosted at http://www.landmarker.io. The Menpo Project is thoroughly documented and provides extensive examples for all of its features. We believe the project is ideal for researchers, practitioners and students alike.

Abstract
Cite

Conference paper

Antonakos E, Alabort-i-Medina J, Tzimiropoulos G, Zafeiriou Set al., 2014, Hog active appearance models, IEEE International Conference on Image Processing (ICIP) 2014, Publisher: IEEE, Pages: 224-228, ISSN: 1522-4880

We propose the combination of dense Histogram of Oriented Gradients (HOG) features with Active Appearance Models (AAMs). We employ the efficient Inverse Compositional optimization technique and show results for the task of face fitting. By taking advantage of the descriptive characteristics of HOG features, we build robust and accurate AAMs that generalize well to unseen faces with illumination, identity, pose and occlusion variations. Our experiments on challenging in-the-wild databases show that HOG AAMs significantly outperfrom current state-of-the-art results of discriminative methods trained on larger databases.

Conference paper

Marras I, Tzimiropoulos G, Zafeiriou S, Pantic Met al., 2014, Online learning and fusion of orientation appearance models for robust rigid object tracking, IMAGE AND VISION COMPUTING, Vol: 32, Pages: 707-727, ISSN: 0262-8856

Author Web Link
Cite
Citations: 4

Journal article

Asthana A, Zafeiriou S, Cheng S, Pantic Met al., 2014, Incremental face alignment in the wild, 2014 IEEE Conference on Computer Vision and Pattern Recognition, Publisher: Institute of Electrical and Electronics Engineers, Pages: 1859-1866, ISSN: 1063-6919

The development of facial databases with an abundance of annotated facial data captured under unconstrained 'in-the-wild' conditions have made discriminative facial deformable models the de facto choice for generic facial landmark localization. Even though very good performance for the facial landmark localization has been shown by many recently proposed discriminative techniques, when it comes to the applications that require excellent accuracy, such as facial behaviour analysis and facial motion capture, the semi-automatic person-specific or even tedious manual tracking is still the preferred choice. One way to construct a person-specific model automatically is through incremental updating of the generic model. This paper deals with the problem of updating a discriminative facial deformable model, a problem that has not been thoroughly studied in the literature. In particular, we study for the first time, to the best of our knowledge, the strategies to update a discriminative model that is trained by a cascade of regressors. We propose very efficient strategies to update the model and we show that is possible to automatically construct robust discriminative person and imaging condition specific models 'in-the-wild' that outperform state-of-the-art generic face alignment strategies.

Conference paper

STEFANOS ZAFEIRIOU, PhD

Contact

Location

Summary