Publications

Booth JA, Roussos A, Ponniah A, Dunaway D, Zafeiriou Set al., 2018, Large scale 3D morphable models, International Journal of Computer Vision, Vol: 126, Pages: 233-254, ISSN: 1573-1405

We present large scale facial model (LSFM)—a 3D Morphable Model (3DMM) automatically constructed from 9663 distinct facial identities. To the best of our knowledge LSFM is the largest-scale Morphable Model ever constructed, containing statistical information from a huge variety of the human population. To build such a large model we introduce a novel fully automated and robust Morphable Model construction pipeline, informed by an evaluation of state-of-the-art dense correspondence techniques. The dataset that LSFM is trained on includes rich demographic information about each subject, allowing for the construction of not only a global 3DMM model but also models tailored for specific age, gender or ethnicity groups. We utilize the proposed model to perform age classification from 3D shape alone and to reconstruct noisy out-of-sample data in the low-dimensional model space. Furthermore, we perform a systematic analysis of the constructed 3DMM models that showcases their quality and descriptive power. The presented extensive qualitative and quantitative evaluations reveal that the proposed 3DMM achieves state-of-the-art results, outperforming existing models by a large margin. Finally, for the benefit of the research community, we make publicly available the source code of the proposed automatic 3DMM construction pipeline, as well as the constructed global 3DMM and a variety of bespoke models tailored by age, gender and ethnicity.

Journal article

Emersic Z, Stepec D, Struc V, Peer P, George A, Ahmad A, Omar E, Boult TE, Safdari R, Zhou Y, Zafeiriou S, Yaman D, Eyiokur FI, Ekenel HKet al., 2018, The Unconstrained Ear Recognition Challenge, IEEE International Joint Conference on Biometrics (IJCB), Publisher: IEEE, Pages: 715-724

Conference paper

Hovhannisyan V, Panagakis Y, Zafeiriou S, Parpas Pet al., 2018, Multilevel approximate robust principal component analysis, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 536-544, ISSN: 2473-9936

Robust principal component analysis (RPCA) is currently the method of choice for recovering a low-rank matrix from sparse corruptions that are of unknown value and support by decomposing the observation matrix into low-rank and sparse matrices. RPCA has many applications including background subtraction, learning of robust subspaces from visual data, etc. Nevertheless, the application of SVD in each iteration of optimisation methods renders the application of RPCA challenging in cases when data is large. In this paper, we propose the first, to the best of our knowledge, multilevel approach for solving convex and non-convex RPCA models. The basic idea is to construct lower dimensional models and perform SVD on them instead of the original high dimensional problem. We show that the proposed approach gives a good approximate solution to the original problem for both convex and non-convex formulations, while being many times faster than original RPCA methods in several real world datasets.

Conference paper

Zafeiriou S, Chrysos GG, Roussos A, Ververas E, Deng J, Trigeorgis Get al., 2018, The 3D Menpo facial landmark tracking challenge, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 2503-2511, ISSN: 2473-9936

Recently, deformable face alignment is synonymous to the task of locating a set of 2D sparse landmarks in intensity images. Currently, discriminatively trained Deep Convolutional Neural Networks (DCNNs) are the state-of-the-art in the task of face alignment. DCNNs exploit large amount of high quality annotations that emerged the last few years. Nevertheless, the provided 2D annotations rarely capture the 3D structure of the face (this is especially evident in the facial boundary). That is, the annotations neither provide an estimate of the depth nor correspond to the 2D projections of the 3D facial structure. This paper summarises our efforts to develop (a) a very large database suitable to be used to train 3D face alignment algorithms in images captured "in-the-wild" and (b) to train and evaluate new methods for 3D face landmark tracking. Finally, we report the results of the first challenge in 3D face tracking "in-the-wild".

Conference paper

Schuller BW, Steidl S, Batliner A, Marschik PB, Baumeister H, Dong F, Hantke S, Pokorny FB, Rathner EM, Bartl-Pokorny KD, Einspieler C, Zhang D, Baird A, Amiriparian S, Qian K, Ren Z, Schmitt M, Tzirakis P, Zafeiriou Set al., 2018, The INTERSPEECH 2018 computational paralinguistics challenge: Atypical & self-assessed affect, crying & Heart beats, The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats, Pages: 122-126, ISSN: 2308-457X

© 2018 International Speech Communication Association. All rights reserved. The INTERSPEECH 2018 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Atypical Affect Sub-Challenge, four basic emotions annotated in the speech of handicapped subjects have to be classified; in the Self-Assessed Affect Sub-Challenge, valence scores given by the speakers themselves are used for a three-class classification problem; in the Crying Sub-Challenge, three types of infant vocalisations have to be told apart; and in the Heart Beats Sub-Challenge, three different types of heart beats have to be determined. We describe the Sub-Challenges, their conditions, and baseline feature extraction and classifiers, which include data-learnt (supervised) feature representations by end-to-end learning, the 'usual' ComParE and BoAW features, and deep unsupervised representation learning using the AUDEEP toolkit for the first time in the challenge series.

Abstract
Cite

Conference paper

Zafeiriou S, Kotsia I, Pantic M, 2018, Unconstrained face recognition, Computer Vision: Concepts, Methodologies, Tools, and Applications, Pages: 1640-1661, ISBN: 9781522552048

The human face is the most well-researched object in computer vision, mainly because (1) it is a highly deformable object whose appearance changes dramatically under different poses, expressions, and, illuminations, etc., (2) the applications of face recognition are numerous and span several fields, (3) it is widely known that humans possess the ability to perform, extremely efficiently and accurately, facial analysis, especially identity recognition. Although a lot of research has been conducted in the past years, the problem of face recognition using images captured in uncontrolled environments including several illumination and/or pose variations still remains open. This is also attributed to the existence of outliers (such as partial occlusion, cosmetics, eyeglasses, etc.) or changes due to age. In this chapter, the authors provide an overview of the existing fully automatic face recognition technologies for uncontrolled scenarios. They present the existing databases and summarize the challenges that arise in such scenarios and conclude by presenting the opportunities that exist in the field.

Abstract
Cite

Book chapter

Xue N, Deng J, Panagakis Y, Zafeiriou Set al., 2018, Informed Non-Convex Robust Principal Component Analysis with Features, 32nd AAAI Conference on Artificial Intelligence / 30th Innovative Applications of Artificial Intelligence Conference / 8th AAAI Symposium on Educational Advances in Artificial Intelligence, Publisher: ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE, Pages: 4343-4349, ISSN: 2159-5399

Conference paper

Bahri M, Panagakis Y, Zafeiriou S, 2017, Robust Kronecker-decomposable component analysis for low-rank modeling, IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 3372-3381, ISSN: 1550-5499

Dictionary learning and component analysis are part of one of the most well-studied and active research fields, at the intersection of signal and image processing, computer vision, and statistical machine learning. In dictionary learning, the current methods of choice are arguably K-SVD and its variants, which learn a dictionary (i.e., a decomposition) for sparse coding via Singular Value Decomposition. In robust component analysis, leading methods derive from Principal Component Pursuit (PCP), which recovers a low-rank matrix from sparse corruptions of unknown magnitude and support. However, K-SVD is sensitive to the presence of noise and outliers in the training set. Additionally, PCP does not provide a dictionary that respects the structure of the data (e.g., images), and requires expensive SVD computations when solved by convex relaxation. In this paper, we introduce a new robust decomposition of images by combining ideas from sparse dictionary learning and PCP. We propose a novel Kronecker-decomposable component analysis which is robust to gross corruption, can be used for low-rank modeling, and leverages separability to solve significantly smaller problems. We design an efficient learning algorithm by drawing links with a restricted form of tensor factorization. The effectiveness of the proposed approach is demonstrated on real-world applications, namely background subtraction and image denoising, by performing a thorough comparison with the current state of the art.

Conference paper

Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, Zafeiriou Set al., 2017, End-to-end multimodal emotion recognition using deep neural networks, IEEE Journal of Selected Topics in Signal Processing, Vol: 11, Pages: 1301-1309, ISSN: 1932-4553

Automatic affect recognition is a challenging task due to the various modalities emotions can be expressed with. Applications can be found in many domains including multimedia retrieval and human-computer interaction. In recent years, deep neural networks have been used with great success in determining emotional states. Inspired by this success, we propose an emotion recognition system using auditory and visual modalities. To capture the emotional content for various styles of speaking, robust features need to be extracted. To this purpose, we utilize a convolutional neural network (CNN) to extract features from the speech, while for the visual modality a deep residual network of 50 layers is used. In addition to the importance of feature extraction, a machine learning algorithm needs also to be insensitive to outliers while being able to model the context. To tackle this problem, long short-term memory networks are utilized. The system is then trained in an end-to-end fashion where-by also taking advantage of the correlations of each of the streams-we manage to significantly outperform, in terms of concordance correlation coefficient, traditional approaches based on auditory and visual handcrafted features for the prediction of spontaneous and natural emotions on the RECOLA database of the AVEC 2016 research challenge on emotion recognition.

Journal article

Sagonas C, Panagakis Y, Leidinger A, Zafeiriou Set al., 2017, Robust joint and individual variance explained, 2017 IEEE International Conference on Computer Vision and Pattern Recognition, Publisher: IEEE, Pages: 5739-5748

Discovering the common (joint) and individual sub-spaces is crucial for analysis of multiple data sets, includingmulti-view and multi-modal data. Several statistical ma-chine learning methods have been developed for discover-ing the common features across multiple data sets. The mostwell studied family of the methods is that of Canonical Cor-relation Analysis (CCA) and its variants. Even though theCCA is a powerful tool, it has several drawbacks that ren-der its application challenging for computer vision appli-cations. That is, it discovers only common features and notindividual ones, and it is sensitive to gross errors presentin visual data. Recently, efforts have been made in orderto develop methods that discover individual and commoncomponents. Nevertheless, these methods are mainly appli-cable in two sets of data. In this paper, we investigate theuse of a recently proposed statistical method, the so-calledJoint and Individual Variance Explained (JIVE) method, forthe recovery of joint and individual components in an arbi-trary number of data sets. Since, the JIVE is not robust togross errors, we propose alternatives, which are both robustto non-Gaussian noise of large magnitude, as well as ableto automatically find the rank of the individual components.We demonstrate the effectiveness of the proposed approachto two computer vision applications, namely facial expres-sion synthesis and face age progression in-the-wild.

Conference paper

Wang M, Panagakis Y, Snape P, Zafeiriou Set al., 2017, Learning the multilinear structure of visual data, 2017 IEEE International Conference on Computer Vision and Pattern Recognition, Publisher: IEEE, Pages: 6053-6061

Statistical decomposition methods are of paramount im-portance in discovering the modes of variations of visualdata. Probably the most prominent linear decompositionmethod is the Principal Component Analysis (PCA), whichdiscovers a single mode of variation in the data. However,in practice, visual data exhibit several modes of variations.For instance, the appearance of faces varies in identity, ex-pression, pose etc. To extract these modes of variations fromvisual data, several supervised methods, such as the Ten-sorFaces, that rely on multilinear (tensor) decomposition(e.g., Higher Order SVD) have been developed. The maindrawbacks of such methods is that they require both labelsregarding the modes of variations and the same number ofsamples under all modes of variations (e.g., the same faceunder different expressions, poses etc.). Therefore, their ap-plicability is limited to well-organised data, usually cap-tured in well-controlled conditions. In this paper, we pro-pose the first general multilinear method, to the best of ourknowledge, that discovers the multilinear structure of visualdata in unsupervised setting. That is, without the presenceof labels. We demonstrate the applicability of the proposedmethod in two applications, namely Shape from Shading(SfS) and expression transfer.

Conference paper

Booth J, Antonakos E, Ploumpis S, Trigeorgis G, Panagakis Y, Zafeiriou Set al., 2017, 3D Face Morphable Models "In-the-Wild", 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 5464-5473, ISSN: 1063-6919

3D Morphable Models (3DMMs) are powerful statisticalmodels of 3D facial shape and texture, and among the state-of-the-art methods for reconstructing facial shape from sin-gle images. With the advent of new 3D sensors, many 3D fa-cial datasets have been collected containing both neutral aswell as expressive faces. However, all datasets are capturedunder controlled conditions. Thus, even though powerful3D facial shape models can be learnt from such data, it isdifficult to build statistical texture models that are sufficientto reconstruct faces captured in unconstrained conditions(“in-the-wild”). In this paper, we propose the first, to thebest of our knowledge, “in-the-wild” 3DMM by combininga powerful statistical model of facial shape, which describesboth identity and expression, with an “in-the-wild” texturemodel. We show that the employment of such an “in-the-wild” texture model greatly simplifies the fitting procedure,because there is no need to optimise with regards to the illu-mination parameters. Furthermore, we propose a new fastalgorithm for fitting the 3DMM in arbitrary images. Fi-nally, we have captured the first 3D facial database withrelatively unconstrained conditions and report quantitativeevaluations with state-of-the-art performance. Complemen-tary qualitative reconstruction results are demonstrated onstandard “in-the-wild” facial databases.

Conference paper

Guler RA, Trigeorgis G, Antonakos E, Snape P, Zafeiriou S, Kokkinos Iet al., 2017, DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 2614-2623, ISSN: 1063-6919

In this paper we propose to learn a mapping from image pixels into a dense template grid through a fully convolutional network. We formulate this task as a regression problem and train our network by leveraging upon manually annotated facial landmarks "in-the-wild". We use such landmarks to establish a dense correspondence field between a three-dimensional object template and the input image, which then serves as the ground-truth for training our regression system. We show that we can combine ideas from semantic segmentation with regression networks, yielding a highly-accurate quantized regression architecture. Our system, called DenseReg, allows us to estimate dense image-to-template correspondences in a fully convolutional manner. As such our network can provide useful correspondence information as a stand-alone system, while when used as an initialization for Statistical Deformable Models we obtain landmark localization results that largely outperform the current state-of-the-art on the challenging 300W benchmark. We thoroughly evaluate our method on a host of facial analysis tasks, and demonstrate its use for other correspondence estimation tasks, such as the human body and the human ear. DenseReg code is made available at http://alpguler.com/DenseReg.html along with supplementary materials.

Conference paper

Papaioannou A, Antonakos E, Zafeiriou S, 2017, Complex Representations for Learning Statistical Shape Priors, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1180-1184, ISSN: 2076-1465

Conference paper

Moschoglou S, Nicolaou M, Panagakis Y, Zafeiriou Set al., 2017, Initializing Probabilistic Linear Discriminant Analysis, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1175-1179, ISSN: 2076-1465

Conference paper

Xue N, Papamakarios G, Bahri M, Panagakis Y, Zafeiriou Set al., 2017, ROBUST LOW-RANK TENSOR MODELLING USING TUCKER AND CP DECOMPOSITION, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1185-1189, ISSN: 2076-1465

Conference paper

Xue N, Panagakis Y, Zafeiriou S, 2017, Side Information in Robust Principal Component Analysis: Algorithms and Applications, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 4327-4335, ISSN: 1550-5499

Conference paper

Chrysos, Zafeiriou S, 2017, PD2T: Person-specific Detection, Deformable Tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN: 0162-8828

Journal article

Chrysos GG, Zafeiriou S, 2017, Deep face deblurring, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 2015-2024, ISSN: 2160-7508

Blind deblurring consists a long studied task, however the outcomes of generic methods are not effective in real world blurred images. Domain-specific methods for deblurring targeted object categories, e.g. text or faces, frequently outperform their generic counterparts, hence they are attracting an increasing amount of attention. In this work, we develop such a domain-specific method to tackle deblurring of human faces, henceforth referred to as face deblurring. Studying faces is of tremendous significance in computer vision, however face deblurring has yet to demonstrate some convincing results. This can be partly attributed to the combination of i) poor texture and ii) highly structure shape that yield the contour/gradient priors (that are typically used) sub-optimal. In our work instead of making assumptions over the prior, we adopt a learning approach by inserting weak supervision that exploits the well-documented structure of the face. Namely, we utilise a deep network to perform the deblurring and employ a face alignment technique to pre-process each face. We additionally surpass the requirement of the deep network for thousands training samples, by introducing an efficient framework that allows the generation of a large dataset. We utilised this framework to create 2MF2, a dataset of over two million frames. We conducted experiments with real world blurred facial images and report that our method returns a result close to the sharp natural latent image.

Conference paper

Kollias D, Nicolaou MA, Kotsia I, Zhao G, Zafeiriou Set al., 2017, Recognition of affect in the wild using deep neural networks, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 1972-1979, ISSN: 2160-7508

In this paper we utilize the first large-scale "in-the-wild" (Aff-Wild) database, which is annotated in terms of the valence-arousal dimensions, to train and test an end-to-end deep neural architecture for the estimation of continuous emotion dimensions based on visual cues. The proposed architecture is based on jointly training convolutional (CNN) and recurrent neural network (RNN) layers, thus exploiting both the invariant properties of convolutional features, while also modelling temporal dynamics that arise in human behaviour via the recurrent layers. Various pre-trained networks are used as starting structures which are subsequently appropriately fine-tuned to the Aff-Wild database. Obtained results show premise for the utilization of deep architectures for the visual analysis of human behaviour in terms of continuous emotion dimensions and analysis of different types of affect.

Conference paper

Zafeiriou S, Trigeorgis G, Chrysos G, Deng J, Shen Jet al., 2017, The Menpo Facial Landmark Localisation Challenge: A step towards the solution, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 2116-2125, ISSN: 2160-7508

Conference paper

Deng J, Zhou Y, Zafeiriou S, 2017, Marginal loss for deep face recognition, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 2006-2014, ISSN: 2160-7508

Convolutional neural networks have significantly boosted the performance of face recognition in recent years due to its high capacity in learning discriminative features. In order to enhance the discriminative power of the deeply learned features, we propose a new supervision signal named marginal loss for deep face recognition. Specifically, the marginal loss simultaneously minimises the intra-class variances as well as maximises the inter-class distances by focusing on the marginal samples. With the joint supervision of softmax loss and marginal loss, we can easily train a robust CNNs to obtain more discriminative deep features. Extensive experiments on several relevant face recognition benchmarks, Labelled Faces in the Wild (LFW), YouTube Faces (YTF), Cross-Age Celebrity Dataset (CACD), Age Database (AgeDB) and MegaFace Challenge, prove the effectiveness of the proposed marginal loss.

Conference paper

Moschoglou S, Papaioannou A, Sagonas C, Deng J, Kotsia I, Zafeiriou Set al., 2017, AgeDB: the first manually collected, in-the-wild age database, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 1997-2005, ISSN: 2160-7508

Over the last few years, increased interest has arisen with respect to age-related tasks in the Computer Vision community. As a result, several "in-the-wild" databases annotated with respect to the age attribute became available in the literature. Nevertheless, one major drawback of these databases is that they are semi-automatically collected and annotated and thus they contain noisy labels. Therefore, the algorithms that are evaluated in such databases are prone to noisy estimates. In order to overcome such drawbacks, we present in this paper the first, to the best of knowledge, manually collected "in-the-wild" age database, dubbed AgeDB, containing images annotated with accurate to the year, noise-free labels. As demonstrated by a series of experiments utilizing state-of-the-art algorithms, this unique property renders AgeDB suitable when performing experiments on age-invariant face verification, age estimation and face age progression "in-the-wild".

Conference paper

Zafeiriou S, Kollias D, Nicolaou MA, Papaioannou A, Zhao G, Kotsia Iet al., 2017, Aff-wild: valence and arousal 'in-the-wild' challenge, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 1980-1987, ISSN: 2160-7508

The Affect-in-the-Wild (Aff-Wild) Challenge proposes a new comprehensive benchmark for assessing the performance of facial affect/behaviour analysis/understanding 'in-the-wild'. The Aff-wild benchmark contains about 300 videos (over 2,000 minutes of data) annotated with regards to valence and arousal, all captured 'in-the-wild' (the main source being Youtube videos). The paper presents the database description, the experimental set up, the baseline method used for the Challenge and finally the summary of the performance of the different methods submitted to the Affect-in-the-Wild Challenge for Valence and Arousal estimation. The challenge demonstrates that meticulously designed deep neural networks can achieve very good performance when trained with in-the-wild data.

Conference paper

Zafeiriou L, Zafeiriou S, Pantic M, 2017, Deep Analysis of Facial Behavioral Dynamics, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 1988-1996, ISSN: 2160-7508

Modelling of facial dynamics, as well as recovering of latent dimensions that correspond to facial dynamics is of paramount importance for many tasks relevant to facial behaviour analysis. Currently, analysis of facial dynamics is performed by applying linear techniques, mainly, on sparse facial tracks. In this, paper we propose the first, to the best of our knowledge, methodology for extracting lowdimensional latent dimensions that correspond to facial dynamics (i.e., motion of facial parts). To this end we develop appropriate unsupervised and supervised deep autoencoder architectures, which are able to extract features that correspond to the facial dynamics. We demonstrate the usefulness of the proposed approach in various facial behaviour datasets.

Conference paper

Schuller B, Steidl S, Batliner A, Bergelson E, Krajewski J, Janott C, Amatuni A, Casillas M, Seidl A, Soderstrom M, Warlaumont AS, Hidalgo G, Schnieder S, Heiser C, Hohenhorst W, Herzog M, Schmitt M, Qian K, Zhang Y, Trigeorgis G, Tzirakis P, Zafeiriou Set al., 2017, The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, cold & snoring, INTERSPEECH 2017, Pages: 3442-3446, ISSN: 2308-457X

Copyright © 2017 ISCA. The INTERSPEECH 2017 Computational Paralinguistics Challenge addresses three different problems for the first time in research competition under well-defined conditions: In the Addressee sub-challenge, it has to be determined whether speech produced by an adult is directed towards another adult or towards a child; in the Cold sub-challenge, speech under cold has to be told apart from 'healthy' speech; and in the Snoring sub-challenge, four different types of snoring have to be classified. In this paper, we describe these sub-challenges, their conditions, and the baseline feature extraction and classifiers, which include data-learnt feature representations by end-to-end learning with convolutional and recurrent neural networks, and bag-of-audio-words for the first time in the challenge series.

Abstract
Cite

Conference paper

Zafeiriou L, Panagakis Y, Pantic M, Zafeiriou Set al., 2017, Nonnegative Decompositions for Dynamic Visual Data Analysis, IEEE Transactions on Image Processing, Vol: 26, Pages: 5603-5617, ISSN: 1057-7149

The analysis of high-dimensional, possibly temporally misaligned, and time-varying visual data is a fundamental task in disciplines, such as image, vision, and behavior computing. In this paper, we focus on dynamic facial behavior analysis and in particular on the analysis of facial expressions. Distinct from the previous approaches, where sets of facial landmarks are used for face representation, raw pixel intensities are exploited for: 1) unsupervised analysis of the temporal phases of facial expressions and facial action units (AUs) and 2) temporal alignment of a certain facial behavior displayed by two different persons. To this end, the slow features nonnegative matrix factorization (SFNMF) is proposed in order to learn slow varying parts-based representations of time varying sequences capturing the underlying dynamics of temporal phenomena, such as facial expressions. Moreover, the SFNMF is extended in order to handle two temporally misaligned data sequences depicting the same visual phenomena. To do so, the dynamic time warping is incorporated into the SFNMF, allowing the temporal alignment of the data sets onto the subspace spanned by the estimated nonnegative shared latent features amongst the two visual sequences. Extensive experimental results in two video databases demonstrate the effectiveness of the proposed methods in: 1) unsupervised detection of the temporal phases of posed and spontaneous facial events and 2) temporal alignment of facial expressions, outperforming by a large margin the state-of-the-art methods that they are compared to.

Abstract
Cite

Journal article

Marras I, Nikitidis S, Zafeiriou S, Pantic Met al., 2017, A joint discriminative generative model for deformable model construction and classification, 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG), Publisher: IEEE, Pages: 127-134, ISSN: 2326-5396

Discriminative classification models have been successfully applied for various computer vision tasks such as object and face detection and recognition. However, deformations can change objects coordinate space and perturb robust similarity measurement, which is the essence of all classification algorithms. The common approach to deal with deformations is either to seek for deformation invariant features or to develop models that describe objects deformations. However, the former approach requires a huge amount of data and a good amount of engineering to be properly trained, while the latter require considerable human effort in the form of carefully annotated data. In this paper, we propose a method that jointly learns with minimal human intervention a generative deformable model using only a simple shape model of the object and images automatically downloaded from the Internet, and also extracts features appropriate for classification. The proposed algorithm is applied on various classification problems such as “in-the-wild” face recognition, gender classification and eye glasses detection on data retrieved by querying into a web image search engine. We demonstrate that not only it outperforms other automatic methods by large margins, but also performs comparably with supervised methods trained on thousands of manually annotated data.

Conference paper

Sagonas C, Panagakis Y, Arunkumar S, Ratha N, Zafeiriou Set al., 2017, Back to the future: A fully automatic method for robust age progression, International Conference on Pattern Recognition (ICPR) 2016, Publisher: IEEE

It has been shown that significant age difference between a probe and gallery face image can decrease the matching accuracy. If the face images can be normalized in age, there can be a huge impact on the face verification accuracy and thus many novel applications such as matching driver's license, passport and visa images with the real person's images can be effectively implemented. Face progression can address this issue by generating a face image for a specific age. Many researchers have attempted to address this problem focusing on predicting older faces from a younger face. In this paper, we propose a novel method for robust and automatic face progression in totally unconstrained conditions. Our method takes into account that faces belonging to the same age-groups share age patterns such as wrinkles while faces across different age-groups share some common patterns such as expressions and skin colors. Given training images of K different age-groups the proposed method learns to recover K low-rank age and one low-rank common components. These extracted components from the learning phase are used to progress an input face to younger as well as older ages in bidirectional fashion. Using standard datasets, we demonstrate that the proposed progression method outperforms state-of-the-art age progression methods and also improves matching accuracy in a face verification protocol that includes age progression.

Conference paper

Gligorijevic V, Panagakis Y, Zafeiriou S, 2017, Fusion and Community Detection in Multi-layer Graphs, 2016 23rd International Conference on Pattern Recognition (ICPR), Publisher: IEEE

Relational data arising in many domains can berepresented by networks (or graphs) with nodes capturingentities and edges representing relationships between these entities.Community detection in networks has become one of themost important problems having a broad range of applications.Until recently, the vast majority of papers have focused ondiscovering community structures in a single network. However,with the emergence of multi-view network data in many realworldapplications and consequently with the advent of multilayergraph representation, community detection in multi-layergraphs has become a new challenge. Multi-layer graphs providecomplementary views of connectivity patterns of the same set ofvertices. Fusion of the network layers is expected to achieve betterclustering performance. In this paper, we propose two novelmethods, coined as WSSNMTF (Weighted Simultaneous SymmetricNon-Negative Matrix Tri-Factorization) and NG-WSSNMTF(Natural Gradient WSSNMTF), for fusion and clustering ofmulti-layer graphs. Both methods are robust with respect tomissing edges and noise. We compare the performance of theproposed methods with two baseline methods, as well as withthree state-of-the-art methods on synthetic and three real-worlddatasets. The experimental results indicate superior performanceof the proposed methods.

Conference paper

STEFANOS ZAFEIRIOU, PhD

Contact

Location

Summary