Publications

Hantke S, Olenyi T, Hausner C, Appel T, Schuller Bet al., 2019, Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform, INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, Vol: 16, Pages: 427-436, ISSN: 1476-8186

Author Web Link
Cite
Citations: 2

Journal article

Pokorny FB, Fiser M, Graf F, Marschik PB, Schuller BWet al., 2019, Sound and the City: Current Perspectives on Acoustic Geo-Sensing in Urban Environment, ACTA ACUSTICA UNITED WITH ACUSTICA, Vol: 105, Pages: 766-778, ISSN: 1610-1928

Author Web Link
Cite
Citations: 4

Journal article

Kollias D, Tzirakis P, Nicolaou MA, Papaioannou A, Zhao G, Schuller B, Kotsia I, Zafeiriou Set al., 2019, Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond, INTERNATIONAL JOURNAL OF COMPUTER VISION, Vol: 127, Pages: 907-929, ISSN: 0920-5691

Author Web Link
Cite
Citations: 120

Journal article

Zhang Z, Han J, Coutinho E, Schuller Bet al., 2019, Dynamic difficulty awareness training for continuous emotion prediction, IEEE Transactions on Multimedia, Vol: 21, Pages: 1289-1301, ISSN: 1941-0077

Time-continuous emotion prediction has become an increasingly compelling task in machine learning. Considerable efforts have been made to advance the performance of these systems. Nonetheless, the main focus has been the development of more sophisticated models and the incorporation of different expressive modalities (e.g., speech, face, and physiology). In this paper, motivated by the benefit of difficulty awareness in a human learning procedure, we propose a novel machine learning framework, namely, Dynamic Difficulty Awareness Training (DDAT), which sheds fresh light on the research - directly exploiting the difficulties in learning to boost the machine learning process. The DDAT framework consists of two stages: information retrieval and information exploitation. In the first stage, we make use of the reconstruction error of input features or the annotation uncertainty to estimate the difficulty of learning specific information. The obtained difficulty level is then used in tandem with original features to update the model input in a second learning stage with the expectation that the model can learn to focus on high difficulty regions of the learning process. We perform extensive experiments on a benchmark database (RECOLA) to evaluate the effectiveness of the proposed framework. The experimental results show that our approach outperforms related baselines as well as other well-established time-continuous emotion prediction systems, which suggests that dynamically integrating the difficulty information for neural networks can help enhance the learning process.

Journal article

Han J, Zhang Z, Cummins N, Schuller Bet al., 2019, Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives, IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, Vol: 14, Pages: 68-81, ISSN: 1556-603X

Journal article

Pandit V, Amiriparian S, Schmitt M, Mousa A, Schuller Bet al., 2019, Big Data Multimedia Mining: Feature Extraction Facing Volume, Velocity, and Variety, Publisher: Wiley

Other

Kim JY, Liu C, Calvo RA, McCabe K, Taylor SCR, Schuller BW, Wu Ket al., 2019, A comparison of online automatic speech recognition systems and the nonverbal responses to unintelligible speech, Publisher: arXiv

Automatic Speech Recognition (ASR) systems have proliferated over the recentyears to the point that free platforms such as YouTube now provide speechrecognition services. Given the wide selection of ASR systems, we contribute tothe field of automatic speech recognition by comparing the relative performanceof two sets of manual transcriptions and five sets of automatic transcriptions(Google Cloud, IBM Watson, Microsoft Azure, Trint, and YouTube) to helpresearchers to select accurate transcription services. In addition, we identifynonverbal behaviors that are associated with unintelligible speech, asindicated by high word error rates. We show that manual transcriptions remainsuperior to current automatic transcriptions. Amongst the automatictranscription services, YouTube offers the most accurate transcription service.For non-verbal behavioral involvement, we provide evidence that the variabilityof smile intensities from the listener is high (low) when the speaker is clear(unintelligible). These findings are derived from videoconferencinginteractions between student doctors and simulated patients; therefore, wecontribute towards both the ASR literature and the healthcare communicationskills teaching community.

Working paper

Qian K, Schmitt M, Janott C, Zhang Z, Heiser C, Hohenhorst W, Herzog M, Hemmert W, Schuller Bet al., 2019, A Bag of Wavelet Features for Snore Sound Classification, ANNALS OF BIOMEDICAL ENGINEERING, Vol: 47, Pages: 1000-1011, ISSN: 0090-6964

Author Web Link
Cite
Citations: 20

Journal article

Schuller B, 2019, Microexpressions: A Chance for Computers to Beat Humans at Detecting Hidden Emotions?, COMPUTER, Vol: 52, Pages: 4-5, ISSN: 0018-9162

Journal article

Zhang Y, Michi A, Wagner J, Andre E, Schuller B, Weninger Fet al., 2019, A Generic Human-Machine Annotation Framework Based on Dynamic Cooperative Learning., IEEE Trans Cybern

The task of obtaining meaningful annotations is a tedious work, incurring considerable costs and time consumption. Dynamic active learning and cooperative learning are recently proposed approaches to reduce human effort of annotating data with subjective phenomena. In this paper, we introduce a novel generic annotation framework, with the aim to achieve the optimal tradeoff between label reliability and cost reduction by making efficient use of human and machine work force. To this end, we use dropout to assess model uncertainty and thereby to decide which instances can be automatically labeled by the machine and which ones require human inspection. In addition, we propose an early stopping criterion based on inter-rater agreement in order to focus human resources on those ambiguous instances that are difficult to label. In contrast to the existing algorithms, the new confidence measures are not only applicable to binary classification tasks but also regression problems. The proposed method is evaluated on the benchmark datasets for non-native English prosody estimation, provided in the INTERSPEECH computational paralinguistics challenge. In the result, the novel dynamic cooperative learning algorithm yields 0.424 Spearman's correlation coefficient compared to 0.413 with passive learning, while reducing the amount of human annotations by 74%.

Journal article

Schuller BW, 2019, IEEE Transactions on Affective Computing-On Novelty and Valence, IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, Vol: 10, Pages: 1-2, ISSN: 1949-3045

Journal article

Xu X, Deng J, Coutinho E, Wu C, Zhao L, Schuller BWet al., 2019, Connecting Subspace Learning and Extreme Learning Machine in Speech Emotion Recognition, IEEE Transactions on Multimedia, ISSN: 1520-9210

IEEE Speech Emotion Recognition (SER) is a powerful tool for endowing computers with the capacity to process information about the affective states of users in human-machine interactions. Recent research has shown the effectiveness of graph embedding based subspace learning and extreme learning machine applied to SER, but there are still various drawbacks in these two techniques that limit their application. Regarding subspace learning, the change from linearity to nonlinearity is usually achieved through kernelisation, while extreme learning machines only take label information into consideration at the output layer. In order to overcome these drawbacks, this paper leverages extreme learning machine for dimensionality reduction and proposes a novel framework to combine spectral regression based subspace learning and extreme learning machine. The proposed framework contains three stages - data mapping, graph decomposition, and regression. At the data mapping stage, various mapping strategies provide different views of the samples. At the graph decomposition stage, specifically designed embedding graphs provide a possibility to better represent the structure of data, through generating virtual coordinates. Finally, at the regression stage, dimension-reduced mappings are achieved by connecting the virtual coordinates and data mapping. Using this framework, we propose several novel dimensionality reduction algorithms, apply them to SER tasks, and compare their performance to relevant state-of-the-art methods. Our results on several paralinguistic corpora show that our proposed techniques lead to significant improvements.

Abstract
Cite

Journal article

Grabowski K, Rynkiewicz A, Lassalle A, Baron-Cohen S, Schuller B, Cummins N, Baird A, Podgorska-Bednarz J, Pieniazek A, Lucka Iet al., 2019, Emotional expression in psychiatric conditions: New technology for clinicians, PSYCHIATRY AND CLINICAL NEUROSCIENCES, Vol: 73, Pages: 50-62, ISSN: 1323-1316

Author Web Link
Cite
Citations: 39

Journal article

Demir F, Sengur A, Lu H, Amiriparian S, Cummins N, Schuller Bet al., 2019, COMPACT BILINEAR DEEP FEATURES FOR ENVIRONMENTAL SOUND RECOGNITION, International Conference on Artificial Intelligence and Data Processing (IDAP), Publisher: IEEE

Conference paper

Amiriparian S, Schmitt M, Hantke S, Pandit V, Schuller Bet al., 2019, Humans inside: Cooperative big multimedia data mining, Intelligent Systems Reference Library, Pages: 235-257

Deep learningÂ techniques such as convolutional neural networks, autoencoders, and deep belief networks require a big amount of trainingÂ data to achieve an optimal performance. Multimedia resources available on social media represent a wealth of data to satisfy this need. However, a prohibitiveÂ amount of effort is required to acquire and label such data and to process them. In this book chapter, we offer a threefold approach to tackle these issues: (1) we introduce a complexÂ network analyser system for large-scale big data collection from online social media platforms, (2) we show the suitability of intelligentÂ crowdsourcing and active learning approaches for effective labelling of large-scale data, and (3) we apply machine learning algorithms for extracting and learning meaningful representations from the collected data. From YouTube—the world’s largest video sharing website we have collected three databases containing a total number of 25 classes for which we have iterated thousands videos from a range of acoustic environments and human speech and vocalisation types. We show that, using the unique combination of our big data extraction and annotation systems with machine learning techniques, it is possible to create new real-world databases from social multimedia in a short amount of time.

Abstract
Cite
Citations: 2

Book chapter

Schuller B, Weninger F, Zhang Y, Ringeval F, Batliner A, Steidl S, Eyben F, Marchi E, Vinciarelli A, Scherer K, Chetouani M, Mortillaro Met al., 2019, Affective and behavioural computing: Lessons learnt from the First Computational Paralinguistics Challenge, COMPUTER SPEECH AND LANGUAGE, Vol: 53, Pages: 156-180, ISSN: 0885-2308

Journal article

Janott C, Rohrmeier C, Schmitt M, Hemmert W, Schuller Bet al., 2019, Snoring - An Acoustic Definition, 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Publisher: IEEE, Pages: 3653-3657, ISSN: 1557-170X

Author Web Link
Cite
Citations: 4

Conference paper

Al Futaisi ND, Zhang Z, Cristia A, Warlaumont AS, Schuller BWet al., 2019, VCMNet: Weakly Supervised Learning for Automatic Infant Vocalisation Maturity Analysis, 21st ACM International Conference on Multimodal Interaction (ICMI), Publisher: ASSOC COMPUTING MACHINERY, Pages: 205-209

Author Web Link
Cite
Citations: 4

Conference paper

Han J, Zhang Z, Ren Z, Schuller Bet al., 2019, IMPLICIT FUSION BY JOINT AUDIOVISUAL TRAINING FOR EMOTION RECOGNITION IN MONO MODALITY, 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 5861-5865, ISSN: 1520-6149

Author Web Link
Cite
Citations: 16

Conference paper

Tzirakis P, Nicolaou MA, Schuller B, Zafeiriou Set al., 2019, Time-series Clustering with Jointly Learning Deep Representations, Clusters and Temporal Boundaries, 14th IEEE International Conference on Automatic Face and Gesture Recognition (FG), Publisher: IEEE, Pages: 438-442, ISSN: 2326-5396

Author Web Link
Cite
Citations: 5

Conference paper

Zhao Z, Bao Z, Zhao Y, Zhang Z, Cummins N, Ren Z, Schuller Bet al., 2019, Exploring Deep Spectrum Representations via Attention-Based Recurrent and Convolutional Neural Networks for Speech Emotion Recognition, IEEE ACCESS, Vol: 7, Pages: 97515-97525, ISSN: 2169-3536

Author Web Link
Cite
Citations: 64

Journal article

Rudovic OO, Zhang M, Schuller B, Picard RWet al., 2019, Multi-modal Active Learning From Human Data: A Deep Reinforcement Learning Approach, ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, Pages: 6-15

Author Web Link
Cite
Citations: 14

Journal article

Zhang Z, Wu B, Schuller B, 2019, ATTENTION-AUGMENTED END-TO-END MULTI-TASK LEARNING FOR EMOTION PREDICTION FROM SPEECH, 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), Pages: 6705-6709, ISSN: 1520-6149

Author Web Link
Cite
Citations: 45

Journal article

Rizos G, Schuller B, 2019, MODELLING SAMPLE INFORMATIVENESS FOR DEEP AFFECTIVE COMPUTING, 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 3482-3486, ISSN: 1520-6149

Author Web Link
Cite
Citations: 4

Conference paper

Xu X, Deng J, Cummins N, Zhang Z, Zhao L, Schuller BWet al., 2019, Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition, Interspeech Conference, Publisher: ISCA-INT SPEECH COMMUNICATION ASSOC, Pages: 949-953, ISSN: 2308-457X

Author Web Link
Cite
Citations: 7

Conference paper

Guo Y, Zhao Z, Ma Y, Schuller BWet al., 2019, Speech Augmentation via Speaker-Specific Noise in Unseen Environment, Interspeech Conference, Publisher: ISCA-INT SPEECH COMMUNICATION ASSOC, Pages: 1781-1785, ISSN: 2308-457X

Conference paper

Ringeval F, Schuller B, Valstar M, Cummins N, Cowie R, Pantic Met al., 2019, AVEC'19: Audio/Visual Emotion Challenge and Workshop, 27th ACM International Conference on Multimedia (MM), Publisher: ASSOC COMPUTING MACHINERY, Pages: 2718-2719

Author Web Link
Cite
Citations: 4

Conference paper

Pandit V, Schmitt M, Cummins N, Schuller Bet al., 2019, I know how you feel now, and here's why!: Demystifying Time-continuous High Resolution Text-based Affect Predictions In theWild, 32nd IEEE International Symposium on Computer-Based Medical Systems (IEEE CBMS), Publisher: IEEE, Pages: 465-470, ISSN: 2372-9198

Author Web Link
Cite
Citations: 1

Conference paper

Ren Z, Kong Q, Han J, Plumbley MD, Schuller BWet al., 2019, ATTENTION-BASED ATROUS CONVOLUTIONAL NEURAL NETWORKS: VISUALISATION AND UNDERSTANDING PERSPECTIVES OF ACOUSTIC SCENES, 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 56-60, ISSN: 1520-6149

Author Web Link
Cite
Citations: 41

Conference paper

Tzirakis P, Zafeiriou S, Schuller B, 2019, Real-world automatic continuous affect recognition from audiovisual signals, MULTIMODAL BEHAVIOR ANALYSIS IN THE WILD: ADVANCES AND CHALLENGES, Editors: AlamedaPineda, Ricci, Sebe, Publisher: ACADEMIC PRESS LTD-ELSEVIER SCIENCE LTD, Pages: 387-406, ISBN: 978-0-12-814601-9

Author Web Link
Cite
Citations: 8

Book chapter

ProfessorBjoernSchuller

Contact

Location

Summary