220 results found
Chrysos GG, Favaro P, Zafeiriou S, 2019, Motion deblurring of faces, International Journal of Computer Vision, Vol: 127, Pages: 801-823, ISSN: 0920-5691
Face analysis lies at the heart of computer vision with remarkable progress in the past decades. Face recognition and tracking are tackled by building invariance to fundamental modes of variation such as illumination, 3D pose. A much less standing mode of variation is motion deblurring, which however presents substantial challenges in face analysis. Recent approaches either make oversimplifying assumptions, e.g. in cases of joint optimization with other tasks, or fail to preserve the highly structured shape/identity information. We introduce a two-step architecture tailored to the challenges of motion deblurring: the first step restores the low frequencies; the second restores the high frequencies, while ensuring that the outputs span the natural images manifold. Both steps are implemented with a supervised data-driven method; to train those we devise a method for creating realistic motion blur by averaging a variable number of frames. The averaged images originate from the 2 MF2 dataset with 19 million facial frames, which we introduce for the task. Considering deblurring as an intermediate step, we conduct a thorough experimentation on high-level face analysis tasks, i.e. landmark localization and face verification, on blurred images. The experimental evaluation demonstrates the superiority of our method.
Kollias D, Tzirakis P, Nicolaou MA, et al., 2019, Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond, International Journal of Computer Vision, Vol: 127, Pages: 907-929, ISSN: 0920-5691
Gligorijevic V, Panagakis Y, Zafeiriou S, 2019, Non-Negative Matrix Factorizations for Multiplex Network Analysis, Publisher: IEEE COMPUTER SOC
Hovhannisyan V, Panagakis I, Parpas P, et al., 2019, Fast multilevel algorithms for compressive principal component pursuit, SIAM Journal on Imaging Sciences, Vol: 12, Pages: 624-649, ISSN: 1936-4954
Recovering a low-rank matrix from highly corrupted measurements arises in compressed sensing of structured high-dimensional signals (e.g., videos and hyperspectral images among others). Robust principal component analysis (RPCA), solved via principal component pursuit (PCP), recovers a low-rank matrix from sparse corruptions that are of unknown value and support by decomposing the observation matrix into two terms: a low-rank matrix and a sparse one, accounting for sparse noise and outliers. In the more general setting, where only a fraction of the data matrix has been observed, low-rank matrix recovery is achieved by solving the compressive principal component pursuit (CPCP). Both PCP and CPCP are well-studied convex programs, and numerousiterative algorithms have been proposed for their optimisation. Nevertheless, these algorithms involve singular value decomposition (SVD) at each iteration, which renders their applicability challenging in the case of massive data. In this paper, we propose a multilevel approach for the solution of PCP and CPCP problems. The core principle behind our algorithm is to apply SVD in models of lower-dimensionality than the original one and then lift its solution to the original problem dimension. Hence, our methods rely on the assumption that the low rank component can be represented in a lower dimensional space. We show that the proposed algorithms are easy to implement, converge at the same rate but with much lower iteration cost. Numerical experiments on numerous synthetic and real problems indicate that the proposed multilevel algorithms are several times faster than their original counterparts, namely PCP and CPCP.
Deng J, Guo J, Xue N, et al., Arcface: additive angular margin loss for deep face recognition, CVPR 2019, Publisher: IEEE
One of the main challenges in feature learning usingDeep Convolutional Neural Networks (DCNNs) for large-scale face recognition is the design of appropriate loss func-tions that enhance discriminative power. Centre loss pe-nalises the distance between the deep features and their cor-responding class centres in the Euclidean space to achieveintra-class compactness. SphereFace assumes that the lin-ear transformation matrix in the last fully connected layercan be used as a representation of the class centres in anangular space and penalises the angles between the deepfeatures and their corresponding weights in a multiplicativeway. Recently, a popular line of research is to incorporatemargins in well-established loss functions in order to max-imise face class separability. In this paper, we propose anAdditive Angular Margin Loss (ArcFace) to obtain highlydiscriminative features for face recognition. The proposedArcFace has a clear geometric interpretation due to the ex-act correspondence to the geodesic distance on the hyper-sphere. We present arguably the most extensive experimen-tal evaluation of all the recent state-of-the-art face recog-nition methods on over 10 face recognition benchmarks in-cluding a new large-scale image database with trillion levelof pairs and a large-scale video dataset. We show that Ar-cFace consistently outperforms the state-of-the-art and canbe easily implemented with negligible computational over-head. We release all refined training data, training codes,pre-trained models and training logs1, which will help re-produce the results in this paper.
Ploumpis S, Wang H, Pears N, et al., Combining 3D morphable models: a large-scale face-and-head model, CVPR 2019, Publisher: IEEE
Three-dimensional Morphable Models (3DMMs) arepowerful statistical tools for representing the 3D surfacesof an object class. In this context, we identify an interestingquestion that has previously not received research attention:is it possible to combine two or more 3DMMs that (a) arebuilt using different templates that perhaps only partly overlap,(b) have different representation capabilities and (c)are built from different datasets that may not be publiclyavailable?In answering this question, we make two contributions.First, we propose two methods for solving thisproblem: i. use a regressor to complete missing parts ofone model using the other, ii. use the Gaussian Processframework to blend covariance matrices from multiple models.Second, as an example application of our approach,we build a new face-and-head shape model that combinesthe variability and facial detail of the LSFM with the fullhead modelling of the LYHM. The resulting combined shapemodel achieves state-of-the-art performance and outperformsexisting head models by a large margin. Finally, as anapplication experiment, we reconstruct full head representationsfrom single, unconstrained images by utilizing ourproposed large-scale model in conjunction with the Face-Warehouse blendshapes for handling expressions.
Gecer B, Ploumpis S, Kotsia I, et al., GANFIT: generative adversarial network fitting for high fidelity 3D face reconstruction, CVPR 2019, Publisher: IEEE
In the past few years a lot of work has been done towardsreconstructing the 3D facial structure from single imagesby capitalizing on the power of Deep Convolutional NeuralNetworks (DCNNs). In the most recent works, differentiablerenderers were employed in order to learn the relationshipbetween the facial identity features and the parameters ofa 3D morphable model for shape and texture. The texturefeatures either correspond to components of a linear texturespace or are learned by auto-encoders directly fromin-the-wild images. In all cases, the quality of the facialtexture reconstruction of the state-of-the-art methods is stillnot capable of modelling textures in high fidelity. In thispaper, we take a radically different approach and harnessthe power of Generative Adversarial Networks (GANs) andDCNNs in order to reconstruct the facial texture and shapefrom single images. That is, we utilize GANs to train a verypowerful generator of facial texture in UV space. Then, werevisit the original 3D Morphable Models (3DMMs) fittingapproaches making use of non-linear optimization to findthe optimal latent parameters that best reconstruct the testimage but under a new perspective. We optimize the parameterswith the supervision of pretrained deep identity featuresthrough our end-to-end differentiable framework. Wedemonstrate excellent results in photorealistic and identitypreserving 3D face reconstructions and achieve for the firsttime, to the best of our knowledge, facial texture reconstructionwith high-frequency details.1
Deng J, Zhou Y, Kotsia I, et al., Dense 3D face decoding over 2500FPS: joint texture and shape convolutional mesh decoders, CVPR 2019, Publisher: IEEE
3D Morphable Models (3DMMs) are statistical modelsthat represent facial texture and shape variations using a setof linear bases and more particular Principal ComponentAnalysis (PCA). 3DMMs were used as statistical priors forreconstructing 3D faces from images by solving non-linearleast square optimization problems. Recently, 3DMMs wereused as generative models for training non-linear mappings(i.e., regressors) from image to the parameters of the modelsvia Deep Convolutional Neural Networks (DCNNs). Nev-ertheless, all of the above methods use either fully con-nected layers or 2D convolutions on parametric unwrappedUV spaces leading to large networks with many parame-ters. In this paper, we present the first, to the best of ourknowledge, non-linear 3DMMs by learning joint textureand shape auto-encoders using direct mesh convolutions.We demonstrate how these auto-encoders can be used totrain very light-weight models that perform Coloured MeshDecoding (CMD) in-the-wild at a speed of over 2500 FPS.
Nicolaou MA, Zafeiriou S, Kotsia I, et al., 2019, Editorial of special issue on human behaviour analysis "in-the-wild", IEEE Transactions on Affective Computing, Vol: 10, Pages: 4-6, ISSN: 1949-3045
The papers in this special section focus on human face and body image analysis, one of the most researched objects. One of the main reasons behind this popularity lies in the numerous applications of automatic face and body gesture analysis algorithms, that span several fields such as Human-Computer and Human-Robot Interaction (facial expression/body gesture recognition for automatic analysis of affect), medicine and healthcare (detection of emotional and cognitive disorders), as well as biometrics (face recognition, gait recognition). The papers in this section focus on recent efforts towards catalysing progress in automatic analysis of human behaviour in uncontrolled, “in-the-wild” conditions. We summarize research efforts towards the development of research methodologies, database collections and benchmarks, as well as algorithms and systems for machine analysis of human behaviour, focusing on facial expressions, body gestures, speech, as well as various other sensors. We are delighted that the special issue includes authors both from academia as well as the industry.
Deng J, Xue N, Cheng S, et al., Side information for face completion: a robust PCA approach, IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN: 0162-8828
Robust principal component analysis (RPCA) is a powerful method for learning low-rank feature representation of variousvisual data. However, for certain types as well as significant amount of error corruption, it fails to yield satisfactory results; a drawbackthat can be alleviated by exploiting domain-dependent prior knowledge or information. In this paper, we propose two models for theRPCA that take into account such side information, even in the presence of missing values. We apply this framework to the task of UVcompletion which is widely used in pose-invariant face recognition. Moreover, we construct a generative adversarial network (GAN) toextract side information as well as subspaces. These subspaces not only assist in the recovery but also speed up the process in caseof large-scale data. We quantitatively and qualitatively evaluate the proposed approaches through both synthetic data and fivereal-world datasets to verify their effectiveness.
Wang M, Shu Z, Cheng S, et al., 2019, An adversarial neuro-tensorial approach for learning disentangled representations, International Journal of Computer Vision, ISSN: 0920-5691
Several factors contribute to the appearance of an object in a visual scene, including pose, illumination, and deformation, among others. Each factor accounts for a source of variability in the data, while the multiplicative interactions of these factors emulate the entangled variability, giving rise to the rich structure of visual object appearance. Disentangling such unobserved factors from visual data is a challenging task, especially when the data have been captured in uncontrolled recording conditions (also referred to as “in-the-wild”) and label information is not available. In this paper, we propose a pseudo-supervised deep learning method for disentangling multiple latent factors of variation in face images captured in-the-wild. To this end, we propose a deep latent variable model, where the multiplicative interactions of multiple latent factors of variation are explicitly modelled by means of multilinear (tensor) structure. We demonstrate that the proposed approach indeed learns disentangled representations of facial expressions and pose, which can be used in various applications, including face editing, as well as 3D face reconstruction and classification of facial expression, identity and pose.
Kollias D, Cheng S, Pantic M, et al., 2019, Photorealistic facial synthesis in the dimensional affect space, European Conference on Computer Vision, Publisher: Springer, Pages: 475-491, ISSN: 0302-9743
This paper presents a novel approach for synthesizing facial affect, which is based on our annotating 600,000 frames of the 4DFAB database in terms of valence and arousal. The input of this approach is a pair of these emotional state descriptors and a neutral 2D image of a person to whom the corresponding affect will be synthesized. Given this target pair, a set of 3D facial meshes is selected, which is used to build a blendshape model and generate the new facial affect. To synthesize the affect on the 2D neutral image, 3DMM fitting is performed and the reconstructed face is deformed to generate the target facial expressions. Last, the new face is rendered into the original image. Both qualitative and quantitative experimental studies illustrate the generation of realistic images, when the neutral image is sampled from a variety of well known databases, such as the Aff-Wild, AFEW, Multi-PIE, AFEW-VA, BU-3DFE, Bosphorus.
Deng J, Cheng S, Xue N, et al., 2018, UV-GAN: Adversarial facial UV map completion for pose-invariant face recognition, 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 7093-7102, ISSN: 1063-6919
Recently proposed robust 3D face alignment methods establish either dense or sparse correspondence between a 3D face model and a 2D facial image. The use of these methods presents new challenges as well as opportunities for facial texture analysis. In particular, by sampling the image using the fitted model, a facial UV can be created. Unfortunately, due to self-occlusion, such a UV map is always incomplete. In this paper, we propose a framework for training Deep Convolutional Neural Network (DCNN) to complete the facial UV map extracted from in-the-wild images. To this end, we first gather complete UV maps by fitting a 3D Morphable Model (3DMM) to various multiview image and video datasets, as well as leveraging on a new 3D dataset with over 3,000 identities. Second, we devise a meticulously designed architecture that combines local and global adversarial DCNNs to learn an identity-preserving facial UV completion model. We demonstrate that by attaching the completed UV to the fitted mesh and generating instances of arbitrary poses, we can increase pose variations for training deep face recognition/verification models, and minimise pose discrepancy during testing, which lead to better performance. Experiments on both controlled and in-the-wild UV datasets prove the effectiveness of our adversarial UV completion model. We achieve state-of-the-art verification accuracy, 94.05%, under the CFP frontal-profile protocol only by combining pose augmentation during training and pose discrepancy reduction during testing. We will release the first in-the-wild UV dataset (we refer as WildUV) that comprises of complete facial UV maps from 1,892 identities for research purposes.
Cheng S, Kotsia I, Pantic M, et al., 2018, 4DFAB: A large scale 4D database for facial expression analysis and biometric applications, 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE, Pages: 5117-5126, ISSN: 1063-6919
The progress we are currently witnessing in many computer vision applications, including automatic face analysis, would not be made possible without tremendous efforts in collecting and annotating large scale visual databases. To this end, we propose 4DFAB, a new large scale database of dynamic high-resolution 3D faces (over 1,800,000 3D meshes). 4DFAB contains recordings of 180 subjects captured in four different sessions spanning over a five-year period. It contains 4D videos of subjects displaying both spontaneous and posed facial behaviours. The database can be used for both face and facial expression recognition, as well as behavioural biometrics. It can also be used to learn very powerful blendshapes for parametrising facial behaviour. In this paper, we conduct several experiments and demonstrate the usefulness of the database for various applications. The database will be made publicly available for research purposes.
Moschoglou S, Ververas E, Panagakis Y, et al., 2018, Multi-attribute robust component analysis for facial UV maps, IEEE Journal of Selected Topics in Signal Processing, Vol: 12, Pages: 1324-1337, ISSN: 1932-4553
The collection of large-scale three-dimensional (3-D) face models has led to significant progress in the field of 3-D face alignment “in-the-wild,” with several methods being proposed toward establishing sparse or dense 3-D correspondences between a given 2-D facial image and a 3-D face model. Utilizing 3-D face alignment improves 2-D face alignment in many ways, such as alleviating issues with artifacts and warping effects in texture images. However, the utilization of 3-D face models introduces a new set of challenges for researchers. Since facial images are commonly captured in arbitrary recording conditions, a considerable amount of missing information and gross outliers is observed (e.g., due to self-occlusion, subjects wearing eye-glasses, and so on). To this end, in this paper we propose the Multi-Attribute Robust Component Analysis (MA-RCA), a novel technique that is suitable for facial UV maps containing a considerable amount of missing information and outliers, while additionally, elegantly incorporates knowledge from various available attributes, such as age and identity. We evaluate the proposed method on problems such as UV denoising, UV completion, facial expression synthesis, and age progression, where MA-RCA outperforms compared techniques.
Deng J, Roussos A, Chrysos G, et al., 2018, The Menpo benchmark for multi-pose 2D and 3D facial landmark localisation and tracking, International Journal of Computer Vision, ISSN: 0920-5691
In this article, we present the Menpo 2D and Menpo 3D benchmarks, two new datasets for multi-pose 2D and 3D facial landmark localisation and tracking. In contrast to the previous benchmarks such as 300W and 300VW, the proposed benchmarks contain facial images in both semi-frontal and profile pose. We introduce an elaborate semi-automatic methodology for providing high-quality annotations for both the Menpo 2D and Menpo 3D benchmarks. In Menpo 2D benchmark, different visible landmark configurations are designed for semi-frontal and profile faces, thus making the 2D face alignment full-pose. In Menpo 3D benchmark, a united landmark configuration is designed for both semi-frontal and profile faces based on the correspondence with a 3D face model, thus making face alignment not only full-pose but also corresponding to the real-world 3D space. Based on the considerable number of annotated images, we organised Menpo 2D Challenge and Menpo 3D Challenge for face alignment under large pose variations in conjunction with CVPR 2017 and ICCV 2017, respectively. The results of these challenges demonstrate that recent deep learning architectures, when trained with the abundant data, lead to excellent results. We also provide a very simple, yet effective solution, named Cascade Multi-view Hourglass Model, to 2D and 3D face alignment. In our method, we take advantage of all 2D and 3D facial landmark annotations in a joint way. We not only capitalise on the correspondences between the semi-frontal and profile 2D facial landmarks but also employ joint supervision from both 2D and 3D facial landmarks. Finally, we discuss future directions on the topic of face alignment.
Bahri M, Panagakis Y, Zafeiriou SP, 2018, Robust Kronecker component analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN: 0162-8828
Dictionary learning and component analysis models are fundamental for learning compact representations relevant to a given task. The model complexity is encoded by means of structure, such as sparsity, low-rankness, or nonnegativity. Unfortunately, approaches like K-SVD that learn dictionaries for sparse coding via Singular Value Decomposition (SVD) are hard to scale, and fragile in the presence of outliers. Conversely, robust component analysis methods such as the Robust Principal Component Analysis (RPCA) are able to recover low-complexity representations from data corrupted with noise of unknown magnitude and support, but do not provide a dictionary that respects the structure of the data, and also involve expensive computations. In this paper, we propose a novel Kronecker-decomposable component analysis model, coined as Robust Kronecker Component Analysis (RKCA), that combines ideas from sparse dictionary learning and robust component analysis. RKCA has several appealing properties, including robustness to gross corruption; it can be used for low-rank modeling, and leverages separability to solve significantly smaller problems. We design an efficient learning algorithm by drawing links with tensor factorizations, and analyze its optimality and low-rankness properties. The effectiveness of the proposed approach is demonstrated on real-world applications, namely background subtraction and image denoising and completion, by performing a thorough comparison with the current state of the art.
Wang M, Panagakis Y, Snape P, et al., 2018, Disentangling the modes of variation in unlabelled data, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 40, Pages: 2682-2695, ISSN: 0162-8828
Statistical methods are of paramount importance in discovering the modes of variation in visual data. The Principal Component Analysis (PCA) is probably the most prominent method for extracting a single mode of variation in the data. However, in practice, several factors contribute to the appearance of visual objects including pose, illumination, and deformation, to mention a few. To extract these modes of variations from visual data, several supervised methods, such as the TensorFaces relying on multilinear (tensor) decomposition have been developed. The main drawbacks of such methods is that they require both labels regarding the modes of variations and the same number of samples under all modes of variations (e.g., the same face under different expressions, poses etc.). Therefore, their applicability is limited to well-organised data, usually captured in well-controlled conditions. In this paper, we propose a novel general multilinear matrix decomposition method that discovers the multilinear structure of possibly incomplete sets of visual data in unsupervised setting (i.e., without the presence of labels). We also propose extensions of the method with sparsity and low-rank constraints in order to handle noisy data, captured in unconstrained conditions. Besides that, a graph-regularised variant of the method is also developed in order to exploit available geometric or label information for some modes of variations. We demonstrate the applicability of the proposed method in several computer vision tasks, including Shape from Shading (SfS) (in the wild and with occlusion removal), expression transfer, and estimation of surface normals from images captured in the wild.
Sagonas, Ververas, Panagakis, et al., 2018, Recovering joint and individual components in facial data, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 40, Pages: 2668-2681, ISSN: 0162-8828
A set of images depicting faces with different expressions or in various ages consists of components that are shared across all images (i.e., joint components) imparting to the depicted object the properties of human faces as well as individual components that are related to different expressions or age groups. Discovering the common (joint) and individual components in facial images is crucial for applications such as facial expression transfer and age progression. The problem is rather challenging when dealing with images captured in unconstrained conditions in the presence of sparse non-Gaussian errors of large magnitude (i.e., sparse gross errors or outliers) and contain missing data. In this paper, we investigate the use of a method recently introduced in statistics, the so-called Joint and Individual Variance Explained (JIVE) method, for the robust recovery of joint and individual components in visual facial data consisting of an arbitrary number of views. Since the JIVE is not robust to sparse gross errors, we propose alternatives, which are (1) robust to sparse gross, non-Gaussian noise, (2) able to automatically find the individual components rank, and (3) can handle missing data. We demonstrate the effectiveness of the proposed methods to several computer vision applications, namely facial expression synthesis and 2D and 3D face age progression ‘in-the-wild’.
Kollias D, Zafeiriou S, 2018, Training deep neural networks with different datasets In-the-wild: The emotion recognition paradigm, 2018 International Joint Conference on Neural Networks (IJCNN), Publisher: IEEE, ISSN: 2161-4407
A novel procedure is presented in this paper, for training a deep convolutional and recurrent neural network, taking into account both the available training data set and some information extracted from similar networks trained with other relevant data sets. This information is included in an extended loss function used for the network training, so that the network can have an improved performance when applied to the other data sets, without forgetting the learned knowledge from the original data set. Facial expression and emotion recognition in-the-wild is the test bed application that is used to demonstrate the improved performance achieved using the proposed approach. In this framework, we provide an experimental study on categorical emotion recognition using datasets from a very recent related emotion recognition challenge.
Booth J, Roussos A, Ververas E, et al., 2018, 3D Reconstruction of "In-the-Wild" Faces in Images and Videos, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, Vol: 40, Pages: 2638-2652, ISSN: 0162-8828
Chrysos GG, Antonakos E, Zafeiriou S, 2018, IPST: Incremental Pictorial Structures for Model-Free Tracking of Deformable Objects, IEEE TRANSACTIONS ON IMAGE PROCESSING, Vol: 27, Pages: 3529-3540, ISSN: 1057-7149
Kampouris C, Zafeiriou S, Ghosh A, 2018, Diffuse-specular separation using binary spherical gradient illumination, Eurographics Symposium on Rendering (EGSR) 2018, Publisher: The Eurographics Association, ISSN: 1727-3463
We introduce a novel method for view-independent diffuse-specular separation of albedo and photometric normals withoutrequiring polarization using binary spherical gradient illumination. The key idea is that with binary gradient illumination, adielectric surface oriented towards the dark hemisphere exhibits pure diffuse reflectance while a surface oriented towards thebright hemisphere exhibits both diffuse and specular reflectance. We exploit this observation to formulate diffuse-specular separationbased on color-space analysis of a surface’s response to binary spherical gradients and their complements. The methoddoes not impose restrictions on viewpoints and requires fewer photographs for multiview acquisition than polarized sphericalgradient illumination. We further demonstrate an efficient two-shot capture using spectral multiplexing of the illumination thatenables diffuse-specular separation of albedo and heuristic separation of photometric normals.
Songsri-in K, Trigeorgis G, Zafeiriou S, 2018, Deep & Deformable: Convolutional Mixtures of Deformable Part-based Models, 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG), Publisher: IEEE, Pages: 218-225, ISSN: 2326-5396
Zhou Y, Deng J, Zafeiriou S, 2018, Improve accurate pose alignment and action localization by dense pose estimation, 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG), Publisher: IEEE, Pages: 480-484, ISSN: 2326-5396
In this work we explore the use of shape-based representations as an auxiliary source of supervision for pose estimation and action recognition. We show that shape-based representations can act as a source of `privileged information' that complements and extends the pure landmark-level annotations. We explore 2D shape-based supervision signals, such as Support Vector Shape. Our experiments show that shape-based supervision signals substantially improve pose alignment accuracy in the form of a cascade architecture. We outperform state-of-the-art methods on the MPII and LSP datasets, while using substantially shallower networks. For action localization in untrimmed videos, our method introduces additional classification signals based on the structured segment networks (SSN) and further improved the performance. To be specific, dense human pose and landmarks localization signals are involved in detection progress. We applied out network to all frames of videos alongside with output from SSN to further improve detection accuracy, especially for pose related and sparsely annotated videos. The method in general achieves state-of-the-art performance on Activity Detection Task for ActivityNet Challenge2017 test set and witnesses remarkable improvement on pose related and sparsely annotated categories e.g. sports.
Booth J, Roussos A, Ververas E, et al., 2018, 3D Reconstruction of "In-the-Wild" Faces in Images and Videos., IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN: 0162-8828
3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though powerful 3D facial shape models can be learnt from such data, it is difficult to build statistical texture models that are sufficient to reconstruct faces captured in unconstrained conditions ("in-the-wild"). In this paper, we propose the first "in-the-wild" 3DMM by combining a statistical model of facial identity and expression shape with an "in-the-wild" texture model. We show that such an approach allows for the development of a greatly simplified fitting procedure for images and videos, as there is no need to optimise with regards to the illumination parameters. We have collected three new databases that combine "in-the-wild" images and video with ground truth 3D facial geometry, the first of their kind, and report extensive quantitative evaluations using them that demonstrate our method is state-of-the-art.
Emersic Z, Stepec D, Struc V, et al., 2018, The Unconstrained Ear Recognition Challenge, IEEE International Joint Conference on Biometrics (IJCB), Publisher: IEEE, Pages: 715-724
Hovhannisyan V, Panagakis Y, Zafeiriou S, et al., 2018, Multilevel approximate robust principal component analysis, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 536-544, ISSN: 2473-9936
Robust principal component analysis (RPCA) is currently the method of choice for recovering a low-rank matrix from sparse corruptions that are of unknown value and support by decomposing the observation matrix into low-rank and sparse matrices. RPCA has many applications including background subtraction, learning of robust subspaces from visual data, etc. Nevertheless, the application of SVD in each iteration of optimisation methods renders the application of RPCA challenging in cases when data is large. In this paper, we propose the first, to the best of our knowledge, multilevel approach for solving convex and non-convex RPCA models. The basic idea is to construct lower dimensional models and perform SVD on them instead of the original high dimensional problem. We show that the proposed approach gives a good approximate solution to the original problem for both convex and non-convex formulations, while being many times faster than original RPCA methods in several real world datasets.
Zafeiriou S, Chrysos GG, Roussos A, et al., 2018, The 3D Menpo facial landmark tracking challenge, 16th IEEE International Conference on Computer Vision (ICCV), Publisher: IEEE, Pages: 2503-2511, ISSN: 2473-9936
Recently, deformable face alignment is synonymous to the task of locating a set of 2D sparse landmarks in intensity images. Currently, discriminatively trained Deep Convolutional Neural Networks (DCNNs) are the state-of-the-art in the task of face alignment. DCNNs exploit large amount of high quality annotations that emerged the last few years. Nevertheless, the provided 2D annotations rarely capture the 3D structure of the face (this is especially evident in the facial boundary). That is, the annotations neither provide an estimate of the depth nor correspond to the 2D projections of the 3D facial structure. This paper summarises our efforts to develop (a) a very large database suitable to be used to train 3D face alignment algorithms in images captured "in-the-wild" and (b) to train and evaluate new methods for 3D face landmark tracking. Finally, we report the results of the first challenge in 3D face tracking "in-the-wild".
Schuller BW, Steidl S, Batliner A, et al., 2018, The INTERSPEECH 2018 computational paralinguistics challenge: Atypical & self-assessed affect, crying & Heart beats, The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats, Pages: 122-126, ISSN: 2308-457X
© 2018 International Speech Communication Association. All rights reserved. The INTERSPEECH 2018 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Atypical Affect Sub-Challenge, four basic emotions annotated in the speech of handicapped subjects have to be classified; in the Self-Assessed Affect Sub-Challenge, valence scores given by the speakers themselves are used for a three-class classification problem; in the Crying Sub-Challenge, three types of infant vocalisations have to be told apart; and in the Heart Beats Sub-Challenge, three different types of heart beats have to be determined. We describe the Sub-Challenges, their conditions, and baseline feature extraction and classifiers, which include data-learnt (supervised) feature representations by end-to-end learning, the 'usual' ComParE and BoAW features, and deep unsupervised representation learning using the AUDEEP toolkit for the first time in the challenge series.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.