Imperial College London

ProfessorBjoernSchuller

Faculty of EngineeringDepartment of Computing

Professor of Artificial Intelligence
 
 
 
//

Contact

 

+44 (0)20 7594 8357bjoern.schuller Website

 
 
//

Location

 

574Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

928 results found

Kossaifi J, Walecki R, Panagakis Y, Shen J, Schmitt M, Ringeval F, Han J, Pandit V, Toisoul A, Schuller BW, Star K, Hajiyev E, Pantic Met al., 2021, SEWA DB: A rich database for audio-visual emotion and sentiment research in the wild, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol: 43, Pages: 1022-1040, ISSN: 0162-8828

Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are becoming indispensable part of our life more and more. Accurately annotated real-world data are the crux in devising such systems. However, existing databases usually consider controlled settings, low demographic variability, and a single task. In this paper, we introduce the SEWA database of more than 2000 minutes of audio-visual data of 398 people coming from six cultures, 50% female, and uniformly spanning the age range of 18 to 65 years old. Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations, mirroring, and continuously valued valence, arousal, liking, agreement, and prototypic examples of (dis)liking. This database aims to be an extremely valuable resource for researchers in affective computing and automatic human sensing and is expected to push forward the research in human behaviour analysis, including cultural studies. Along with the database, we provide extensive baseline experiments for automatic FAU detection and automatic valence, arousal and (dis)liking intensity estimation.

Journal article

Cheng J, Liang R, Liang Z, Zhao L, Huang C, Schuller Bet al., 2021, A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy, IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, Vol: 29, Pages: 41-53, ISSN: 2329-9290

Journal article

Han J, Zhang Z, Pantic M, Schuller Bet al., 2021, Internet of emotional people: Towards continual affective computing cross cultures via audiovisual signals, FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, Vol: 114, Pages: 294-306, ISSN: 0167-739X

Journal article

Pandit V, Schmitt M, Cummins N, Schuller Bet al., 2020, I see it in your eyes: Training the shallowest-possible CNN to recognise emotions and pain from muted web-assisted in-the-wild video-chats in real-time, INFORMATION PROCESSING & MANAGEMENT, Vol: 57, ISSN: 0306-4573

Journal article

Zhang Z, Metaxas DN, Lee H-Y, Schuller BWet al., 2020, Guest Editorial Special Issue on Adversarial Learning in Computational Intelligence, IEEE Transactions on Emerging Topics in Computational Intelligence, Vol: 4, Pages: 414-416

Journal article

Dong F, Qian K, Ren Z, Baird A, Li X, Dai Z, Dong B, Metze F, Yamamoto Y, Schuller BWet al., 2020, Machine Listening for Heart Status Monitoring: Introducing and Benchmarking HSS-The Heart Sounds Shenzhen Corpus, IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, Vol: 24, Pages: 2082-2092, ISSN: 2168-2194

Journal article

Han J, Zhang Z, Ren Z, Schuller Bet al., 2020, Exploring Perception Uncertainty for Emotion Recognition in Dyadic Conversation and Music Listening, COGNITIVE COMPUTATION, ISSN: 1866-9956

Journal article

Amiriparian S, Cummins N, Gerczuk M, Pugachevskiy S, Ottl S, Schuller Bet al., 2020, "Are You Playing a Shooter Again?!" Deep Representation Learning for Audio-Based Video Game Genre Recognition, IEEE TRANSACTIONS ON GAMES, Vol: 12, Pages: 145-154, ISSN: 2475-1502

Journal article

Parada-Cabaleiro E, Costantini G, Batliner A, Schmitt M, Schuller BWet al., 2020, DEMoS: an Italian emotional speech corpus Elicitation methods, machine learning, and perception, LANGUAGE RESOURCES AND EVALUATION, Vol: 54, Pages: 341-383, ISSN: 1574-020X

Journal article

Schuller DM, Schuller BW, 2020, A Review on Five Recent and Near-Future Developments in Computational Processing of Emotion in the Human Voice, EMOTION REVIEW, Vol: 13, Pages: 44-50, ISSN: 1754-0739

Journal article

Kaklauskas A, Zavadskas EK, Schuller B, Lepkova N, Dzemyda G, Sliogeriene J, Kurasova Oet al., 2020, Customized ViNeRS Method for Video Neuro-Advertising of Green Housing, INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, Vol: 17

Journal article

Wu P, Sun X, Zhao Z, Wang H, Pan S, Schuller Bet al., 2020, Classification of Lung Nodules Based on Deep Residual Networks and Migration Learning, COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, Vol: 2020, ISSN: 1687-5265

Journal article

Pokorny FB, Bartl-Pokorny KD, Zhang D, Marschik PB, Schuller D, Schuller BWet al., 2020, Efficient Collection and Representation of Preverbal Data in Typical and Atypical Development, JOURNAL OF NONVERBAL BEHAVIOR, Vol: 44, Pages: 419-436, ISSN: 0191-5886

Journal article

Zhao Z, Bao Z, Zhang Z, Deng J, Cummins N, Wang H, Tao J, Schuller Bet al., 2020, Automatic Assessment of Depression From Speech via a Hierarchical Attention Transfer Network and Attention Autoencoders, IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, Vol: 14, Pages: 423-434, ISSN: 1932-4553

Journal article

Deng J, Schuller B, Eyben F, Schuller D, Zhang Z, Francois H, Oh Eet al., 2020, Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration, NEURAL COMPUTING & APPLICATIONS, Vol: 32, Pages: 1095-1107, ISSN: 0941-0643

Journal article

Parada-Cabaleiro E, Batliner A, Baird A, Schuller Bet al., 2020, The perception of emotional cues by children in artificial background noise, INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, Vol: 23, Pages: 169-182, ISSN: 1381-2416

Journal article

Zhang Z, Han J, Qian K, Janott C, Guo Y, Schuller Bet al., 2020, Snore-GANs: improving automatic snore sound classification with synthesized data, IEEE Journal of Biomedical and Health Informatics, Vol: 24, Pages: 300-310, ISSN: 2168-2194

One of the frontier issues that severely hamper the development of automatic snore sound classification (ASSC) associates to the lack of sufficient supervised training data. To cope with this problem, we propose a novel data augmentation approach based on semi-supervised conditional Generative Adversarial Networks (scGANs), which aims to automatically learn a mapping strategy from a random noise space to original data distribution. The proposed approach has the capability of well synthesizing ‘realistic’ high-dimensional data, while requiring no additional annotation process. To handle the mode collapse problem of GANs, we further introduce an ensemble strategy to enhance the diversity of the generated data. The systematic experiments conducted on a widely used Munich-Passau snore sound corpus demonstrate that the scGANs-based systems can remarkably outperform other classic data augmentation systems, and are also competitive to other recently reported systems for ASSC.

Journal article

Haque KN, Rana R, Schuller BW, 2020, High-Fidelity Audio Generation and Representation Learning With Guided Adversarial Autoencoder, IEEE ACCESS, Vol: 8, Pages: 223509-223528, ISSN: 2169-3536

Journal article

Amiriparian S, Schmitt M, Ottl S, Gerczuk M, Schuller Bet al., 2020, Deep unsupervised representation learning for audio-based medical applications, Intelligent Systems Reference Library, Pages: 137-164

© Springer Nature Switzerland AG 2020. Feature learning denotes a set of approaches for transforming raw input data into representations that can be effectively utilised in solving machine learning problems. Classifiers or regressors require training data which is computationally suitable to process. However, real-world data, e.g., an audio recording from a group of people talking in a park whilst in the background a dog is barking and a musician is playing the guitar, or health-related data such as coughing and sneezing recorded by consumer smartphones, comprises a remarkably variable and complex nature. For understanding such data, developing expert-designed, hand-crafted features often demands for an exhaustive amount of time and resources. Another disadvantage of such features is the lack of generalisation, i.e., there is a need for re-engineering new features for new tasks. Therefore, it is inevitable to develop automatic representation learning methods. In this chapter, we first discuss the preliminaries of contemporary representation learning techniques for computer audition tasks. Hereby, we differentiate between approaches based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). We then introduce and evaluate three state-of-the-art deep learning systems for unsupervised representation learning from raw audio: (1) pre-trained image classification CNNs, (2) a deep convolutional generative adversarial network (DCGAN), and (3) a recurrent sequence-to-sequence autoencoder (S2SAE). For each of these algorithms, the representations are obtained from the spectrograms of the input audio data. Finally, for a range of audio-based machine learning tasks, including abnormal heart sound classification, snore sound classification, and bipolar disorder recognition, we evaluate the efficacy of the deep representations, which are: (i) the activations of the fully connected layers of the pre-trained CNNs, (ii) the activations of the discriminat

Book chapter

Schuller DM, Schuller BW, 2020, The Challenge of Automatic Eating Behaviour Analysis and Tracking, Intelligent Systems Reference Library, Pages: 187-204

© 2020, Springer Nature Switzerland AG. Computer-based tracking of eating behaviour is recently finding great interest by a broader choice of modalities such as by audio and video, or movement sensors, in particular in wearable every-day settings. Here, we provide an extensive insight into the current state-of-play for automatic tracking with a broader view on sensors and information used up to this point. The chapter is largely guided by and including results from the Interspeech 2015 Computational Paralinguistics Challenge (ComParE) Eating Sub-Challenge and the audio/visual Eating Analysis and Tracking (EAT) 2018 Challenge, both co-organised by the last author. The relevance is given by use-cases in health care and wellbeing including, amongst others, assistive technologies for individuals with eating disorders potentially leading either to under- or overeating or special health conditions such as diabetes. The chapter touches upon different feature representations including feature brute-forcing, bag-of-audio-word representations, and deep end-to-end learning from a raw sensor signal. It further reports on machine learning approaches used in the field including deep learning and conventional approaches. In the conclusion, the chapter discusses also usability aspects to foster optimal adherence, such as sensor placement, energy consumption, explainability, and privacy aspects.

Book chapter

Yang Z, Qian K, Ren Z, Baird A, Zhang Z, Schuller Bet al., 2020, Learning multi-resolution representations for acoustic scene classification via neural networks, Pages: 133-143, ISBN: 9789811527555

© Springer Nature Singapore Pte Ltd 2020. This study investigates the performance of wavelet as well as conventional temporal and spectral features for acoustic scene classification, testing the effectiveness of both feature sets when combined with neural networks on acoustic scene classification. The TUT Acoustic Scenes 2017 Database is used in the evaluation of the system. The model with wavelet energy feature achieved 74.8 % and 60.2 % on development and evaluation set respectively, which is better than the model using temporal and spectral feature set (72.9 % and 59.4 %). Additionally, to optimise the generalisation and robustness of the models, a decision fusion method based on the posterior probability of each audio scene is used. Comparing with the baseline system of the Detection and Classification Acoustic Scenes and Events 2017 (DCASE 2017) challenge, the best decision fusion model achieves 79.2 % and 63.8 % on the development and evaluation sets, respectively, where both results significantly exceed the baseline system result of 74.8 % and 61.0 % (confirmed by one tailed z-test p < 0.01 and p < 0.05 respectively.

Book chapter

Costin H, Schuller B, Florea AM, 2020, Preface

Book

Keren G, Sabato S, Schuller B, 2020, Analysis of loss functions for fast single-class classification, KNOWLEDGE AND INFORMATION SYSTEMS, Vol: 62, Pages: 337-358, ISSN: 0219-1377

Journal article

Latif S, Rana R, Khalifa S, Jurdak R, Epps J, Schuller BWet al., 2020, Multi-Task Semi-Supervised Adversarial Autoencoding for Speech Emotion Recognition, IEEE Transactions on Affective Computing

IEEE Inspite the emerging importance of Speech Emotion Recognition (SER), the state-of-the-art accuracy is quite low and needs improvement to make commercial applications of SER viable. A key underlying reason for the low accuracy is the scarcity of emotion datasets, which is a challenge for developing any robust machine learning model in general. In this paper, we propose a solution to this problem: a multi-task learning framework that uses auxiliary tasks for which data is abundantly available. We show that utilisation of this additional data can improve the primary task of SER for which only limited labelled data is available. In particular, we use gender identifications and speaker recognition as auxiliary tasks, which allow the use of very large datasets, e.g., speaker classification datasets. To maximise the benefit of multi-task learning, we further use an adversarial autoencoder (AAE) within our framework, which has a strong capability to learn powerful and discriminative features. Furthermore, the unsupervised AAE in combination with the supervised classification networks enables semi-supervised learning which incorporates a discriminative component in the AAE unsupervised training pipeline. The proposed model is rigorously evaluated for categorical and dimensional emotion, and cross-corpus scenarios. Experimental results demonstrate that the proposed model achieves state-of-the-art performance on two publicly available dataset.

Journal article

Littmann M, Selig K, Cohen-Lavi L, Frank Y, Hoenigschmid P, Kataka E, Moesch A, Qian K, Ron A, Schmid S, Sorbie A, Szlak L, Dagan-Wiener A, Ben-Tal N, Niv MY, Razansky D, Schuller BW, Ankerst D, Hertz T, Rost Bet al., 2020, Validity of machine learning in biology and medicine increased through collaborations across fields of expertise, NATURE MACHINE INTELLIGENCE, Vol: 2, Pages: 18-24

Journal article

Zhang Z, Qian K, Schuller BW, Wollherr Det al., 2020, An Online Robot Collision Detection and Identification Scheme by Supervised Learning and Bayesian Decision Theory, IEEE Transactions on Automation Science and Engineering, Pages: 1-13, ISSN: 1545-5955

Journal article

Rizos G, Schuller BW, 2020, Average Jane, Where Art Thou? – Recent Avenues in Efficient Machine Learning Under Subjectivity Uncertainty, Information Processing and Management of Uncertainty in Knowledge-Based Systems, Publisher: Springer International Publishing, Pages: 42-55, ISBN: 9783030501457

Book chapter

Marchi E, Schuller B, Baird A, Baron-Cohen S, Lassalle A, O'Reilly H, Pigat D, Robinson P, Davies I, Baltrusaitis T, Adams A, Mahmoud M, Golan O, Fridenson-Hayo S, Tal S, Newman S, Meir-Goren N, Camurri A, Piana S, Boelte S, Sezgin M, Alyuz N, Rynkiewicz A, Baranger Aet al., 2019, The ASC-Inclusion Perceptual Serious Gaming Platform for Autistic Children, IEEE TRANSACTIONS ON GAMES, Vol: 11, Pages: 328-339, ISSN: 2475-1502

Journal article

Amiriparian S, Han J, Schmitt M, Baird A, Mallol-Ragolta A, Milling M, Gerczuk M, Schuller Bet al., 2019, Synchronization in Interpersonal Speech, FRONTIERS IN ROBOTICS AND AI, Vol: 6, ISSN: 2296-9144

Journal article

Amiriparian S, Ottl S, Gerczuk M, Pugachevskiy S, Schuller Bet al., 2019, Audio-based eating analysis and tracking utilising deep spectrum features

© 2019 IEEE. This This paper proposes a deep learning system for audio-based eating analysis on the ICMI 2018 Eating Analysis and Tracking (EAT) challenge corpus. We utilise Deep Spectrum features which are image classification convolutional neural network (CNN) descriptors. We extract the Deep Spectrum features by forwarding Mel-spectrograms from input audio through deep task-independent pre-trained CNNs, including AlexNet and VGG16. We then use the activations of first (fc6), second (fc7), and third (fc8) fully connected layers from these networks as feature vectors. We obtain the best classification result by using the first fully connected layer (fc6) of AlexNet for extracting the features from Mel-spectrograms with a window size of 160 ms and a hop size of 80 ms and a viridis colour map. Finally, we build Bag-of-Deep-Features (BoDF) which is the quantisation of the Deep Spectrum features. In comparison to the best baseline results on the test partitions of the Food Type and the Likability sub-challenges, unweighted average recall is increased from 67.2 percent to 79.9 percent and from 54.2 percent to 56.1 percent, respectively. For the test partition of the Difficulty sub-challenge the concordance correlation coefficient is increased from.506 to.509.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00672433&limit=30&person=true