Publications
1107 results found
Hantke S, Olenyi T, Hausner C, et al., 2019, Large-scale Data Collection and Analysis via a Gamified Intelligent Crowdsourcing Platform, INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, Vol: 16, Pages: 427-436, ISSN: 1476-8186
- Author Web Link
- Cite
- Citations: 2
Pokorny FB, Fiser M, Graf F, et al., 2019, Sound and the City: Current Perspectives on Acoustic Geo-Sensing in Urban Environment, ACTA ACUSTICA UNITED WITH ACUSTICA, Vol: 105, Pages: 766-778, ISSN: 1610-1928
- Author Web Link
- Cite
- Citations: 4
Kollias D, Tzirakis P, Nicolaou MA, et al., 2019, Deep Affect Prediction in-the-Wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond, INTERNATIONAL JOURNAL OF COMPUTER VISION, Vol: 127, Pages: 907-929, ISSN: 0920-5691
- Author Web Link
- Cite
- Citations: 120
Zhang Z, Han J, Coutinho E, et al., 2019, Dynamic difficulty awareness training for continuous emotion prediction, IEEE Transactions on Multimedia, Vol: 21, Pages: 1289-1301, ISSN: 1941-0077
Time-continuous emotion prediction has become an increasingly compelling task in machine learning. Considerable efforts have been made to advance the performance of these systems. Nonetheless, the main focus has been the development of more sophisticated models and the incorporation of different expressive modalities (e.g., speech, face, and physiology). In this paper, motivated by the benefit of difficulty awareness in a human learning procedure, we propose a novel machine learning framework, namely, Dynamic Difficulty Awareness Training (DDAT), which sheds fresh light on the research - directly exploiting the difficulties in learning to boost the machine learning process. The DDAT framework consists of two stages: information retrieval and information exploitation. In the first stage, we make use of the reconstruction error of input features or the annotation uncertainty to estimate the difficulty of learning specific information. The obtained difficulty level is then used in tandem with original features to update the model input in a second learning stage with the expectation that the model can learn to focus on high difficulty regions of the learning process. We perform extensive experiments on a benchmark database (RECOLA) to evaluate the effectiveness of the proposed framework. The experimental results show that our approach outperforms related baselines as well as other well-established time-continuous emotion prediction systems, which suggests that dynamically integrating the difficulty information for neural networks can help enhance the learning process.
Han J, Zhang Z, Cummins N, et al., 2019, Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives, IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, Vol: 14, Pages: 68-81, ISSN: 1556-603X
Pandit V, Amiriparian S, Schmitt M, et al., 2019, Big Data Multimedia Mining: Feature Extraction Facing Volume, Velocity, and Variety, Publisher: Wiley
Kim JY, Liu C, Calvo RA, et al., 2019, A comparison of online automatic speech recognition systems and the nonverbal responses to unintelligible speech, Publisher: arXiv
Automatic Speech Recognition (ASR) systems have proliferated over the recentyears to the point that free platforms such as YouTube now provide speechrecognition services. Given the wide selection of ASR systems, we contribute tothe field of automatic speech recognition by comparing the relative performanceof two sets of manual transcriptions and five sets of automatic transcriptions(Google Cloud, IBM Watson, Microsoft Azure, Trint, and YouTube) to helpresearchers to select accurate transcription services. In addition, we identifynonverbal behaviors that are associated with unintelligible speech, asindicated by high word error rates. We show that manual transcriptions remainsuperior to current automatic transcriptions. Amongst the automatictranscription services, YouTube offers the most accurate transcription service.For non-verbal behavioral involvement, we provide evidence that the variabilityof smile intensities from the listener is high (low) when the speaker is clear(unintelligible). These findings are derived from videoconferencinginteractions between student doctors and simulated patients; therefore, wecontribute towards both the ASR literature and the healthcare communicationskills teaching community.
Qian K, Schmitt M, Janott C, et al., 2019, A Bag of Wavelet Features for Snore Sound Classification, ANNALS OF BIOMEDICAL ENGINEERING, Vol: 47, Pages: 1000-1011, ISSN: 0090-6964
- Author Web Link
- Cite
- Citations: 20
Schuller B, 2019, Microexpressions: A Chance for Computers to Beat Humans at Detecting Hidden Emotions?, COMPUTER, Vol: 52, Pages: 4-5, ISSN: 0018-9162
Zhang Y, Michi A, Wagner J, et al., 2019, A Generic Human-Machine Annotation Framework Based on Dynamic Cooperative Learning., IEEE Trans Cybern
The task of obtaining meaningful annotations is a tedious work, incurring considerable costs and time consumption. Dynamic active learning and cooperative learning are recently proposed approaches to reduce human effort of annotating data with subjective phenomena. In this paper, we introduce a novel generic annotation framework, with the aim to achieve the optimal tradeoff between label reliability and cost reduction by making efficient use of human and machine work force. To this end, we use dropout to assess model uncertainty and thereby to decide which instances can be automatically labeled by the machine and which ones require human inspection. In addition, we propose an early stopping criterion based on inter-rater agreement in order to focus human resources on those ambiguous instances that are difficult to label. In contrast to the existing algorithms, the new confidence measures are not only applicable to binary classification tasks but also regression problems. The proposed method is evaluated on the benchmark datasets for non-native English prosody estimation, provided in the INTERSPEECH computational paralinguistics challenge. In the result, the novel dynamic cooperative learning algorithm yields 0.424 Spearman's correlation coefficient compared to 0.413 with passive learning, while reducing the amount of human annotations by 74%.
Schuller BW, 2019, IEEE Transactions on Affective Computing-On Novelty and Valence, IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, Vol: 10, Pages: 1-2, ISSN: 1949-3045
Xu X, Deng J, Coutinho E, et al., 2019, Connecting Subspace Learning and Extreme Learning Machine in Speech Emotion Recognition, IEEE Transactions on Multimedia, ISSN: 1520-9210
IEEE Speech Emotion Recognition (SER) is a powerful tool for endowing computers with the capacity to process information about the affective states of users in human-machine interactions. Recent research has shown the effectiveness of graph embedding based subspace learning and extreme learning machine applied to SER, but there are still various drawbacks in these two techniques that limit their application. Regarding subspace learning, the change from linearity to nonlinearity is usually achieved through kernelisation, while extreme learning machines only take label information into consideration at the output layer. In order to overcome these drawbacks, this paper leverages extreme learning machine for dimensionality reduction and proposes a novel framework to combine spectral regression based subspace learning and extreme learning machine. The proposed framework contains three stages - data mapping, graph decomposition, and regression. At the data mapping stage, various mapping strategies provide different views of the samples. At the graph decomposition stage, specifically designed embedding graphs provide a possibility to better represent the structure of data, through generating virtual coordinates. Finally, at the regression stage, dimension-reduced mappings are achieved by connecting the virtual coordinates and data mapping. Using this framework, we propose several novel dimensionality reduction algorithms, apply them to SER tasks, and compare their performance to relevant state-of-the-art methods. Our results on several paralinguistic corpora show that our proposed techniques lead to significant improvements.
Grabowski K, Rynkiewicz A, Lassalle A, et al., 2019, Emotional expression in psychiatric conditions: New technology for clinicians, PSYCHIATRY AND CLINICAL NEUROSCIENCES, Vol: 73, Pages: 50-62, ISSN: 1323-1316
- Author Web Link
- Cite
- Citations: 39
Demir F, Sengur A, Lu H, et al., 2019, COMPACT BILINEAR DEEP FEATURES FOR ENVIRONMENTAL SOUND RECOGNITION, International Conference on Artificial Intelligence and Data Processing (IDAP), Publisher: IEEE
Amiriparian S, Schmitt M, Hantke S, et al., 2019, Humans inside: Cooperative big multimedia data mining, Intelligent Systems Reference Library, Pages: 235-257
Deep learning techniques such as convolutional neural networks, autoencoders, and deep belief networks require a big amount of training data to achieve an optimal performance. Multimedia resources available on social media represent a wealth of data to satisfy this need. However, a prohibitive amount of effort is required to acquire and label such data and to process them. In this book chapter, we offer a threefold approach to tackle these issues: (1) we introduce a complex network analyser system for large-scale big data collection from online social media platforms, (2) we show the suitability of intelligent crowdsourcing and active learning approaches for effective labelling of large-scale data, and (3) we apply machine learning algorithms for extracting and learning meaningful representations from the collected data. From YouTube—the world’s largest video sharing website we have collected three databases containing a total number of 25 classes for which we have iterated thousands videos from a range of acoustic environments and human speech and vocalisation types. We show that, using the unique combination of our big data extraction and annotation systems with machine learning techniques, it is possible to create new real-world databases from social multimedia in a short amount of time.
Schuller B, Weninger F, Zhang Y, et al., 2019, Affective and behavioural computing: Lessons learnt from the First Computational Paralinguistics Challenge, COMPUTER SPEECH AND LANGUAGE, Vol: 53, Pages: 156-180, ISSN: 0885-2308
Janott C, Rohrmeier C, Schmitt M, et al., 2019, Snoring - An Acoustic Definition, 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Publisher: IEEE, Pages: 3653-3657, ISSN: 1557-170X
- Author Web Link
- Cite
- Citations: 4
Al Futaisi ND, Zhang Z, Cristia A, et al., 2019, VCMNet: Weakly Supervised Learning for Automatic Infant Vocalisation Maturity Analysis, 21st ACM International Conference on Multimodal Interaction (ICMI), Publisher: ASSOC COMPUTING MACHINERY, Pages: 205-209
- Author Web Link
- Cite
- Citations: 4
Han J, Zhang Z, Ren Z, et al., 2019, IMPLICIT FUSION BY JOINT AUDIOVISUAL TRAINING FOR EMOTION RECOGNITION IN MONO MODALITY, 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 5861-5865, ISSN: 1520-6149
- Author Web Link
- Cite
- Citations: 16
Tzirakis P, Nicolaou MA, Schuller B, et al., 2019, Time-series Clustering with Jointly Learning Deep Representations, Clusters and Temporal Boundaries, 14th IEEE International Conference on Automatic Face and Gesture Recognition (FG), Publisher: IEEE, Pages: 438-442, ISSN: 2326-5396
- Author Web Link
- Cite
- Citations: 5
Zhao Z, Bao Z, Zhao Y, et al., 2019, Exploring Deep Spectrum Representations via Attention-Based Recurrent and Convolutional Neural Networks for Speech Emotion Recognition, IEEE ACCESS, Vol: 7, Pages: 97515-97525, ISSN: 2169-3536
- Author Web Link
- Cite
- Citations: 64
Rudovic OO, Zhang M, Schuller B, et al., 2019, Multi-modal Active Learning From Human Data: A Deep Reinforcement Learning Approach, ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, Pages: 6-15
- Author Web Link
- Cite
- Citations: 14
Zhang Z, Wu B, Schuller B, 2019, ATTENTION-AUGMENTED END-TO-END MULTI-TASK LEARNING FOR EMOTION PREDICTION FROM SPEECH, 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), Pages: 6705-6709, ISSN: 1520-6149
- Author Web Link
- Cite
- Citations: 45
Rizos G, Schuller B, 2019, MODELLING SAMPLE INFORMATIVENESS FOR DEEP AFFECTIVE COMPUTING, 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 3482-3486, ISSN: 1520-6149
- Author Web Link
- Cite
- Citations: 4
Xu X, Deng J, Cummins N, et al., 2019, Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition, Interspeech Conference, Publisher: ISCA-INT SPEECH COMMUNICATION ASSOC, Pages: 949-953, ISSN: 2308-457X
- Author Web Link
- Cite
- Citations: 7
Guo Y, Zhao Z, Ma Y, et al., 2019, Speech Augmentation via Speaker-Specific Noise in Unseen Environment, Interspeech Conference, Publisher: ISCA-INT SPEECH COMMUNICATION ASSOC, Pages: 1781-1785, ISSN: 2308-457X
Ringeval F, Schuller B, Valstar M, et al., 2019, AVEC'19: Audio/Visual Emotion Challenge and Workshop, 27th ACM International Conference on Multimedia (MM), Publisher: ASSOC COMPUTING MACHINERY, Pages: 2718-2719
- Author Web Link
- Cite
- Citations: 4
Pandit V, Schmitt M, Cummins N, et al., 2019, I <i>know</i> how you feel <i>now</i>, and here's <i>why</i>!: Demystifying Time-continuous High Resolution Text-based Affect Predictions In theWild, 32nd IEEE International Symposium on Computer-Based Medical Systems (IEEE CBMS), Publisher: IEEE, Pages: 465-470, ISSN: 2372-9198
- Author Web Link
- Cite
- Citations: 1
Ren Z, Kong Q, Han J, et al., 2019, ATTENTION-BASED ATROUS CONVOLUTIONAL NEURAL NETWORKS: VISUALISATION AND UNDERSTANDING PERSPECTIVES OF ACOUSTIC SCENES, 44th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 56-60, ISSN: 1520-6149
- Author Web Link
- Cite
- Citations: 41
Tzirakis P, Zafeiriou S, Schuller B, 2019, Real-world automatic continuous affect recognition from audiovisual signals, MULTIMODAL BEHAVIOR ANALYSIS IN THE WILD: ADVANCES AND CHALLENGES, Editors: AlamedaPineda, Ricci, Sebe, Publisher: ACADEMIC PRESS LTD-ELSEVIER SCIENCE LTD, Pages: 387-406, ISBN: 978-0-12-814601-9
- Author Web Link
- Cite
- Citations: 8
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.