Imperial College London

STEFANOS ZAFEIRIOU, PhD

Faculty of EngineeringDepartment of Computing

Reader in Machine Learning and Computer Vision
 
 
 
//

Contact

 

+44 (0)20 7594 8461s.zafeiriou Website CV

 
 
//

Location

 

375Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

220 results found

Schuller BW, Steidl S, Batliner A, Marschik PB, Baumeister H, Dong F, Hantke S, Pokorny FB, Rathner EM, Bartl-Pokorny KD, Einspieler C, Zhang D, Baird A, Amiriparian S, Qian K, Ren Z, Schmitt M, Tzirakis P, Zafeiriou Set al., 2018, The INTERSPEECH 2018 computational paralinguistics challenge: Atypical & self-assessed affect, crying & Heart beats, The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats, Pages: 122-126, ISSN: 2308-457X

© 2018 International Speech Communication Association. All rights reserved. The INTERSPEECH 2018 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Atypical Affect Sub-Challenge, four basic emotions annotated in the speech of handicapped subjects have to be classified; in the Self-Assessed Affect Sub-Challenge, valence scores given by the speakers themselves are used for a three-class classification problem; in the Crying Sub-Challenge, three types of infant vocalisations have to be told apart; and in the Heart Beats Sub-Challenge, three different types of heart beats have to be determined. We describe the Sub-Challenges, their conditions, and baseline feature extraction and classifiers, which include data-learnt (supervised) feature representations by end-to-end learning, the 'usual' ComParE and BoAW features, and deep unsupervised representation learning using the AUDEEP toolkit for the first time in the challenge series.

Conference paper

Zafeiriou S, Kotsia I, Pantic M, 2018, Unconstrained face recognition, Computer Vision: Concepts, Methodologies, Tools, and Applications, Pages: 1640-1661, ISBN: 9781522552048

© 2018 by IGI Global. All rights reserved. The human face is the most well-researched object in computer vision, mainly because (1) it is a highly deformable object whose appearance changes dramatically under different poses, expressions, and, illuminations, etc., (2) the applications of face recognition are numerous and span several fields, (3) it is widely known that humans possess the ability to perform, extremely efficiently and accurately, facial analysis, especially identity recognition. Although a lot of research has been conducted in the past years, the problem of face recognition using images captured in uncontrolled environments including several illumination and/or pose variations still remains open. This is also attributed to the existence of outliers (such as partial occlusion, cosmetics, eyeglasses, etc.) or changes due to age. In this chapter, the authors provide an overview of the existing fully automatic face recognition technologies for uncontrolled scenarios. They present the existing databases and summarize the challenges that arise in such scenarios and conclude by presenting the opportunities that exist in the field.

Book chapter

Xue N, Deng J, Panagakis Y, Zafeiriou Set al., 2018, Informed non-convex robust principal component analysis with features, Pages: 4343-4349

Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. We revisit the problem of robust principal component analysis with features acting as prior side information. To this aim, a novel, elegant, non-convex optimization approach is proposed to decompose a given observation matrix into a low-rank core and the corresponding sparse residual. Rigorous theoretical analysis of the proposed algorithm results in exact recovery guarantees with low computational complexity. Aptly designed synthetic experiments demonstrate that our method is the first to wholly harness the power of non-convexity over convexity in terms of both recoverability and speed. That is, the proposed non-convex approach is more accurate and faster compared to the best available algorithms for the problem under study. Two real-world applications, namely image classification and face denoising further exemplify the practical superiority of the proposed method.

Conference paper

Bahri M, Panagakis Y, Zafeiriou S, 2017, Robust Kronecker-decomposable component analysis for low-rank modeling, IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 3372-3381, ISSN: 1550-5499

Dictionary learning and component analysis are part of one of the most well-studied and active research fields, at the intersection of signal and image processing, computer vision, and statistical machine learning. In dictionary learning, the current methods of choice are arguably K-SVD and its variants, which learn a dictionary (i.e., a decomposition) for sparse coding via Singular Value Decomposition. In robust component analysis, leading methods derive from Principal Component Pursuit (PCP), which recovers a low-rank matrix from sparse corruptions of unknown magnitude and support. However, K-SVD is sensitive to the presence of noise and outliers in the training set. Additionally, PCP does not provide a dictionary that respects the structure of the data (e.g., images), and requires expensive SVD computations when solved by convex relaxation. In this paper, we introduce a new robust decomposition of images by combining ideas from sparse dictionary learning and PCP. We propose a novel Kronecker-decomposable component analysis which is robust to gross corruption, can be used for low-rank modeling, and leverages separability to solve significantly smaller problems. We design an efficient learning algorithm by drawing links with a restricted form of tensor factorization. The effectiveness of the proposed approach is demonstrated on real-world applications, namely background subtraction and image denoising, by performing a thorough comparison with the current state of the art.

Conference paper

Tzirakis, Trigeorgis, Nicolaou, Schuller, Zafeiriou Set al., 2017, End-to-end multimodal emotion recognition using deep neural networks, IEEE Journal of Selected Topics in Signal Processing, Vol: 11, Pages: 1301-1309, ISSN: 1932-4553

Automatic affect recognition is a challenging task due to the various modalities emotions can be expressed with. Applications can be found in many domains including multimedia retrieval and human-computer interaction. In recent years, deep neural networks have been used with great success in determining emotional states. Inspired by this success, we propose an emotion recognition system using auditory and visual modalities. To capture the emotional content for various styles of speaking, robust features need to be extracted. To this purpose, we utilize a convolutional neural network (CNN) to extract features from the speech, while for the visual modality a deep residual network of 50 layers is used. In addition to the importance of feature extraction, a machine learning algorithm needs also to be insensitive to outliers while being able to model the context. To tackle this problem, long short-term memory networks are utilized. The system is then trained in an end-to-end fashion where-by also taking advantage of the correlations of each of the streams-we manage to significantly outperform, in terms of concordance correlation coefficient, traditional approaches based on auditory and visual handcrafted features for the prediction of spontaneous and natural emotions on the RECOLA database of the AVEC 2016 research challenge on emotion recognition.

Journal article

Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, Zafeiriou Set al., 2017, End-to-End Multimodal Emotion Recognition Using Deep Neural Networks, IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, Vol: 11, Pages: 1301-1309, ISSN: 1932-4553

Journal article

Guler RA, Trigeorgis G, Antonakos E, Snape P, Zafeiriou S, Kokkinos Iet al., 2017, DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 2614-2623, ISSN: 1063-6919

In this paper we propose to learn a mapping from image pixels into a dense template grid through a fully convolutional network. We formulate this task as a regression problem and train our network by leveraging upon manually annotated facial landmarks "in-the-wild". We use such landmarks to establish a dense correspondence field between a three-dimensional object template and the input image, which then serves as the ground-truth for training our regression system. We show that we can combine ideas from semantic segmentation with regression networks, yielding a highly-accurate quantized regression architecture. Our system, called DenseReg, allows us to estimate dense image-to-template correspondences in a fully convolutional manner. As such our network can provide useful correspondence information as a stand-alone system, while when used as an initialization for Statistical Deformable Models we obtain landmark localization results that largely outperform the current state-of-the-art on the challenging 300W benchmark. We thoroughly evaluate our method on a host of facial analysis tasks, and demonstrate its use for other correspondence estimation tasks, such as the human body and the human ear. DenseReg code is made available at http://alpguler.com/DenseReg.html along with supplementary materials.

Conference paper

Booth J, Antonakos E, Ploumpis S, Trigeorgis G, Panagakis Y, Zafeiriou Set al., 2017, 3D Face Morphable Models "In-the-Wild", 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 5464-5473, ISSN: 1063-6919

3D Morphable Models (3DMMs) are powerful statisticalmodels of 3D facial shape and texture, and among the state-of-the-art methods for reconstructing facial shape from sin-gle images. With the advent of new 3D sensors, many 3D fa-cial datasets have been collected containing both neutral aswell as expressive faces. However, all datasets are capturedunder controlled conditions. Thus, even though powerful3D facial shape models can be learnt from such data, it isdifficult to build statistical texture models that are sufficientto reconstruct faces captured in unconstrained conditions(“in-the-wild”). In this paper, we propose the first, to thebest of our knowledge, “in-the-wild” 3DMM by combininga powerful statistical model of facial shape, which describesboth identity and expression, with an “in-the-wild” texturemodel. We show that the employment of such an “in-the-wild” texture model greatly simplifies the fitting procedure,because there is no need to optimise with regards to the illu-mination parameters. Furthermore, we propose a new fastalgorithm for fitting the 3DMM in arbitrary images. Fi-nally, we have captured the first 3D facial database withrelatively unconstrained conditions and report quantitativeevaluations with state-of-the-art performance. Complemen-tary qualitative reconstruction results are demonstrated onstandard “in-the-wild” facial databases.

Conference paper

Chrysos G, Zafeiriou SP, 2017, PD2T: Person-specific Detection, Deformable Tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN: 0162-8828

Face detection/alignment has reached a satisfactory state in static images captured under arbitrary conditions. Such methods typically perform (joint) fitting independently for each frame and are used in commercial applications; however in the majority of the real-world scenarios the dynamic scenes are of interest. Hence, we argue that generic fitting per frame is suboptimal (it discards the informative correlation of sequential frames) and propose to learn person-specific statistics from the video to improve the generic results. To that end, we introduce a meticulously studied pipeline, which we name PD\textsuperscript{2}T, that performs person-specific detection and landmark localisation. We carry out extensive experimentation with a diverse set of i) generic fitting results, ii) different objects (human faces, animal faces) that illustrate the powerful properties of our proposed pipeline and experimentally verify that PD\textsuperscript{2}T outperforms all the compared methods.

Journal article

Papaioannou A, Antonakos E, Zafeiriou S, 2017, Complex Representations for Learning Statistical Shape Priors, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1180-1184, ISSN: 2076-1465

Conference paper

Moschoglou S, Nicolaou M, Panagakis Y, Zafeiriou Set al., 2017, Initializing Probabilistic Linear Discriminant Analysis, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1175-1179, ISSN: 2076-1465

Conference paper

Xue N, Papamakarios G, Bahri M, Panagakis Y, Zafeiriou Set al., 2017, ROBUST LOW-RANK TENSOR MODELLING USING TUCKER AND CP DECOMPOSITION, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1185-1189, ISSN: 2076-1465

Conference paper

Xue N, Panagakis Y, Zafeiriou S, 2017, Side Information in Robust Principal Component Analysis: Algorithms and Applications, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 4327-4335, ISSN: 1550-5499

Conference paper

Chrysos, Zafeiriou S, PD2T: Person-specific Detection, Deformable Tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN: 0162-8828

Journal article

Chrysos GG, Zafeiriou S, 2017, Deep face deblurring, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 2015-2024, ISSN: 2160-7508

Blind deblurring consists a long studied task, however the outcomes of generic methods are not effective in real world blurred images. Domain-specific methods for deblurring targeted object categories, e.g. text or faces, frequently outperform their generic counterparts, hence they are attracting an increasing amount of attention. In this work, we develop such a domain-specific method to tackle deblurring of human faces, henceforth referred to as face deblurring. Studying faces is of tremendous significance in computer vision, however face deblurring has yet to demonstrate some convincing results. This can be partly attributed to the combination of i) poor texture and ii) highly structure shape that yield the contour/gradient priors (that are typically used) sub-optimal. In our work instead of making assumptions over the prior, we adopt a learning approach by inserting weak supervision that exploits the well-documented structure of the face. Namely, we utilise a deep network to perform the deblurring and employ a face alignment technique to pre-process each face. We additionally surpass the requirement of the deep network for thousands training samples, by introducing an efficient framework that allows the generation of a large dataset. We utilised this framework to create 2MF2, a dataset of over two million frames. We conducted experiments with real world blurred facial images and report that our method returns a result close to the sharp natural latent image.

Conference paper

Kollias D, Nicolaou MA, Kotsia I, Zhao G, Zafeiriou Set al., 2017, Recognition of affect in the wild using deep neural networks, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 1972-1979, ISSN: 2160-7508

In this paper we utilize the first large-scale "in-the-wild" (Aff-Wild) database, which is annotated in terms of the valence-arousal dimensions, to train and test an end-to-end deep neural architecture for the estimation of continuous emotion dimensions based on visual cues. The proposed architecture is based on jointly training convolutional (CNN) and recurrent neural network (RNN) layers, thus exploiting both the invariant properties of convolutional features, while also modelling temporal dynamics that arise in human behaviour via the recurrent layers. Various pre-trained networks are used as starting structures which are subsequently appropriately fine-tuned to the Aff-Wild database. Obtained results show premise for the utilization of deep architectures for the visual analysis of human behaviour in terms of continuous emotion dimensions and analysis of different types of affect.

Conference paper

Zafeiriou S, Trigeorgis G, Chrysos G, Deng J, Shen Jet al., 2017, The Menpo Facial Landmark Localisation Challenge: A step towards the solution, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 2116-2125, ISSN: 2160-7508

Conference paper

Zafeiriou S, Kollias D, Nicolaou MA, Papaioannou A, Zhao G, Kotsia Iet al., 2017, Aff-wild: valence and arousal 'in-the-wild' challenge, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 1980-1987, ISSN: 2160-7508

The Affect-in-the-Wild (Aff-Wild) Challenge proposes a new comprehensive benchmark for assessing the performance of facial affect/behaviour analysis/understanding 'in-the-wild'. The Aff-wild benchmark contains about 300 videos (over 2,000 minutes of data) annotated with regards to valence and arousal, all captured 'in-the-wild' (the main source being Youtube videos). The paper presents the database description, the experimental set up, the baseline method used for the Challenge and finally the summary of the performance of the different methods submitted to the Affect-in-the-Wild Challenge for Valence and Arousal estimation. The challenge demonstrates that meticulously designed deep neural networks can achieve very good performance when trained with in-the-wild data.

Conference paper

Deng J, Zhou Y, Zafeiriou S, 2017, Marginal loss for deep face recognition, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 2006-2014, ISSN: 2160-7508

Convolutional neural networks have significantly boosted the performance of face recognition in recent years due to its high capacity in learning discriminative features. In order to enhance the discriminative power of the deeply learned features, we propose a new supervision signal named marginal loss for deep face recognition. Specifically, the marginal loss simultaneously minimises the intra-class variances as well as maximises the inter-class distances by focusing on the marginal samples. With the joint supervision of softmax loss and marginal loss, we can easily train a robust CNNs to obtain more discriminative deep features. Extensive experiments on several relevant face recognition benchmarks, Labelled Faces in the Wild (LFW), YouTube Faces (YTF), Cross-Age Celebrity Dataset (CACD), Age Database (AgeDB) and MegaFace Challenge, prove the effectiveness of the proposed marginal loss.

Conference paper

Moschoglou S, Papaioannou A, Sagonas C, Deng J, Kotsia I, Zafeiriou Set al., 2017, AgeDB: the first manually collected, in-the-wild age database, 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 1997-2005, ISSN: 2160-7508

Over the last few years, increased interest has arisen with respect to age-related tasks in the Computer Vision community. As a result, several "in-the-wild" databases annotated with respect to the age attribute became available in the literature. Nevertheless, one major drawback of these databases is that they are semi-automatically collected and annotated and thus they contain noisy labels. Therefore, the algorithms that are evaluated in such databases are prone to noisy estimates. In order to overcome such drawbacks, we present in this paper the first, to the best of knowledge, manually collected "in-the-wild" age database, dubbed AgeDB, containing images annotated with accurate to the year, noise-free labels. As demonstrated by a series of experiments utilizing state-of-the-art algorithms, this unique property renders AgeDB suitable when performing experiments on age-invariant face verification, age estimation and face age progression "in-the-wild".

Conference paper

Zafeiriou L, Zafeiriou S, Pantic M, 2017, Deep Analysis of Facial Behavioral Dynamics, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Publisher: IEEE, Pages: 1988-1996, ISSN: 2160-7508

Modelling of facial dynamics, as well as recovering of latent dimensions that correspond to facial dynamics is of paramount importance for many tasks relevant to facial behaviour analysis. Currently, analysis of facial dynamics is performed by applying linear techniques, mainly, on sparse facial tracks. In this, paper we propose the first, to the best of our knowledge, methodology for extracting lowdimensional latent dimensions that correspond to facial dynamics (i.e., motion of facial parts). To this end we develop appropriate unsupervised and supervised deep autoencoder architectures, which are able to extract features that correspond to the facial dynamics. We demonstrate the usefulness of the proposed approach in various facial behaviour datasets.

Conference paper

Schuller B, Steidl S, Batliner A, Bergelson E, Krajewski J, Janott C, Amatuni A, Casillas M, Seidl A, Soderstrom M, Warlaumont AS, Hidalgo G, Schnieder S, Heiser C, Hohenhorst W, Herzog M, Schmitt M, Qian K, Zhang Y, Trigeorgis G, Tzirakis P, Zafeiriou Set al., 2017, The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, cold & snoring, INTERSPEECH 2017, Pages: 3442-3446, ISSN: 2308-457X

Copyright © 2017 ISCA. The INTERSPEECH 2017 Computational Paralinguistics Challenge addresses three different problems for the first time in research competition under well-defined conditions: In the Addressee sub-challenge, it has to be determined whether speech produced by an adult is directed towards another adult or towards a child; in the Cold sub-challenge, speech under cold has to be told apart from 'healthy' speech; and in the Snoring sub-challenge, four different types of snoring have to be classified. In this paper, we describe these sub-challenges, their conditions, and the baseline feature extraction and classifiers, which include data-learnt feature representations by end-to-end learning with convolutional and recurrent neural networks, and bag-of-audio-words for the first time in the challenge series.

Conference paper

Zafeiriou L, Panagakis Y, Pantic M, Zafeiriou Set al., 2017, Nonnegative Decompositions for Dynamic Visual Data Analysis, IEEE Transactions on Image Processing, Vol: 26, Pages: 5603-5617, ISSN: 1057-7149

The analysis of high-dimensional, possibly temporally misaligned, and time-varying visual data is a fundamental task in disciplines, such as image, vision, and behavior computing. In this paper, we focus on dynamic facial behavior analysis and in particular on the analysis of facial expressions. Distinct from the previous approaches, where sets of facial landmarks are used for face representation, raw pixel intensities are exploited for: 1) unsupervised analysis of the temporal phases of facial expressions and facial action units (AUs) and 2) temporal alignment of a certain facial behavior displayed by two different persons. To this end, the slow features nonnegative matrix factorization (SFNMF) is proposed in order to learn slow varying parts-based representations of time varying sequences capturing the underlying dynamics of temporal phenomena, such as facial expressions. Moreover, the SFNMF is extended in order to handle two temporally misaligned data sequences depicting the same visual phenomena. To do so, the dynamic time warping is incorporated into the SFNMF, allowing the temporal alignment of the data sets onto the subspace spanned by the estimated nonnegative shared latent features amongst the two visual sequences. Extensive experimental results in two video databases demonstrate the effectiveness of the proposed methods in: 1) unsupervised detection of the temporal phases of posed and spontaneous facial events and 2) temporal alignment of facial expressions, outperforming by a large margin the state-of-the-art methods that they are compared to.

Journal article

Marras I, Nikitidis S, Zafeiriou S, Pantic Met al., 2017, A joint discriminative generative model for deformable model construction and classification, 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG), Publisher: IEEE, Pages: 127-134, ISSN: 2326-5396

Discriminative classification models have been successfully applied for various computer vision tasks such as object and face detection and recognition. However, deformations can change objects coordinate space and perturb robust similarity measurement, which is the essence of all classification algorithms. The common approach to deal with deformations is either to seek for deformation invariant features or to develop models that describe objects deformations. However, the former approach requires a huge amount of data and a good amount of engineering to be properly trained, while the latter require considerable human effort in the form of carefully annotated data. In this paper, we propose a method that jointly learns with minimal human intervention a generative deformable model using only a simple shape model of the object and images automatically downloaded from the Internet, and also extracts features appropriate for classification. The proposed algorithm is applied on various classification problems such as “in-the-wild” face recognition, gender classification and eye glasses detection on data retrieved by querying into a web image search engine. We demonstrate that not only it outperforms other automatic methods by large margins, but also performs comparably with supervised methods trained on thousands of manually annotated data.

Conference paper

Trigeorgis G, Nicolaou M, Zafeiriou S, Schuller Bet al., 2017, Deep canonical time warping for simultaneous alignment and representation learning of sequences, IEEE transactions on Pattern Analysis and Machine Intelligence, Vol: 40, Pages: 1128-1138, ISSN: 2160-9292

Machine learning algorithms for the analysis of time-series often depend on the assumption that utilised data are temporally aligned. Any temporal discrepancies arising in the data is certain to lead to ill-generalisable models, which in turn fail to correctly capture properties of the task at hand. The temporal alignment of time-series is thus a crucial challenge manifesting in a multitude of applications. Nevertheless, the vast majority of algorithms oriented towards temporal alignment are either applied directly on the observation space or simply utilise linear projections - thus failing to capture complex, hierarchical non-linear representations that may prove beneficial, especially when dealing with multi-modal data (e.g., visual and acoustic information). To this end, we present Deep Canonical Time Warping (DCTW), a method that automatically learns non-linear representations of multiple time-series that are (i) maximally correlated in a shared subspace, and (ii) temporally aligned. Furthermore, we extend DCTW to a supervised setting, where during training, available labels can be utilised towards enhancing the alignment process. By means of experiments on four datasets, we show that the representations learnt significantly outperform state-of-the-art methods in temporal alignment, elegantly handling scenarios with heterogeneous feature sets, such as the temporal alignment of acoustic and visual information.

Journal article

Sagonas C, Panagakis Y, Arunkumar S, Ratha N, Zafeiriou Set al., 2017, Back to the future: A fully automatic method for robust age progression, International Conference on Pattern Recognition (ICPR) 2016, Publisher: IEEE

It has been shown that significant age difference between a probe and gallery face image can decrease the matching accuracy. If the face images can be normalized in age, there can be a huge impact on the face verification accuracy and thus many novel applications such as matching driver's license, passport and visa images with the real person's images can be effectively implemented. Face progression can address this issue by generating a face image for a specific age. Many researchers have attempted to address this problem focusing on predicting older faces from a younger face. In this paper, we propose a novel method for robust and automatic face progression in totally unconstrained conditions. Our method takes into account that faces belonging to the same age-groups share age patterns such as wrinkles while faces across different age-groups share some common patterns such as expressions and skin colors. Given training images of K different age-groups the proposed method learns to recover K low-rank age and one low-rank common components. These extracted components from the learning phase are used to progress an input face to younger as well as older ages in bidirectional fashion. Using standard datasets, we demonstrate that the proposed progression method outperforms state-of-the-art age progression methods and also improves matching accuracy in a face verification protocol that includes age progression.

Conference paper

Gligorijevic V, Panagakis Y, Zafeiriou S, 2017, Fusion and Community Detection in Multi-layer Graphs, 2016 23rd International Conference on Pattern Recognition (ICPR), Publisher: IEEE

Relational data arising in many domains can berepresented by networks (or graphs) with nodes capturingentities and edges representing relationships between these entities.Community detection in networks has become one of themost important problems having a broad range of applications.Until recently, the vast majority of papers have focused ondiscovering community structures in a single network. However,with the emergence of multi-view network data in many realworldapplications and consequently with the advent of multilayergraph representation, community detection in multi-layergraphs has become a new challenge. Multi-layer graphs providecomplementary views of connectivity patterns of the same set ofvertices. Fusion of the network layers is expected to achieve betterclustering performance. In this paper, we propose two novelmethods, coined as WSSNMTF (Weighted Simultaneous SymmetricNon-Negative Matrix Tri-Factorization) and NG-WSSNMTF(Natural Gradient WSSNMTF), for fusion and clustering ofmulti-layer graphs. Both methods are robust with respect tomissing edges and noise. We compare the performance of theproposed methods with two baseline methods, as well as withthree state-of-the-art methods on synthetic and three real-worlddatasets. The experimental results indicate superior performanceof the proposed methods.

Conference paper

Booth JA, Roussos A, Ponniah A, Dunaway D, Zafeiriou Set al., 2017, Large scale 3D morphable models, International Journal of Computer Vision, Vol: 126, Pages: 233-254, ISSN: 1573-1405

We present large scale facial model (LSFM)—a 3D Morphable Model (3DMM) automatically constructed from 9663 distinct facial identities. To the best of our knowledge LSFM is the largest-scale Morphable Model ever constructed, containing statistical information from a huge variety of the human population. To build such a large model we introduce a novel fully automated and robust Morphable Model construction pipeline, informed by an evaluation of state-of-the-art dense correspondence techniques. The dataset that LSFM is trained on includes rich demographic information about each subject, allowing for the construction of not only a global 3DMM model but also models tailored for specific age, gender or ethnicity groups. We utilize the proposed model to perform age classification from 3D shape alone and to reconstruct noisy out-of-sample data in the low-dimensional model space. Furthermore, we perform a systematic analysis of the constructed 3DMM models that showcases their quality and descriptive power. The presented extensive qualitative and quantitative evaluations reveal that the proposed 3DMM achieves state-of-the-art results, outperforming existing models by a large margin. Finally, for the benefit of the research community, we make publicly available the source code of the proposed automatic 3DMM construction pipeline, as well as the constructed global 3DMM and a variety of bespoke models tailored by age, gender and ethnicity.

Journal article

Fabris A, Nicolaou MA, Kotsia I, Zafeiriou Set al., 2017, DYNAMIC PROBABILISTIC LINEAR DISCRIMINANT ANALYSIS FOR VIDEO CLASSIFICATION, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 2781-2785, ISSN: 1520-6149

Conference paper

Guler R, Trigeorgis G, Antonakos E, Snape P, Zafeiriou S, Kokkinos Iet al., DenseReg: fully convolutional dense shape regression in-the-wild, 2017 IEEE International Conference on Computer Vision and Pattern Recognition, Publisher: IEEE

In this paper we propose to learn a mapping from imagepixels into a dense template grid through a fully convolutionalnetwork. We formulate this task as a regression problemand train our network by leveraging upon manually annotatedfacial landmarks “in-the-wild”. We use such landmarksto establish a dense correspondence field betweena three-dimensional object template and the input image,which then serves as the ground-truth for training our regressionsystem. We show that we can combine ideas fromsemantic segmentation with regression networks, yielding ahighly-accurate ‘quantized regression’ architecture.Our system, called DenseReg, allows us to estimatedense image-to-template correspondences in a fully convolutionalmanner. As such our network can provide usefulcorrespondence information as a stand-alone system, whilewhen used as an initialization for Statistical DeformableModels we obtain landmark localization results that largelyoutperform the current state-of-the-art on the challenging300W benchmark. We thoroughly evaluate our method ona host of facial analysis tasks, and demonstrate its use forother correspondence estimation tasks, such as the humanbody and the human ear. DenseReg code is made availableat http://alpguler.com/DenseReg.html alongwith supplementary materials.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: id=00581716&limit=30&person=true&page=2&respub-action=search.html