Publications

BibTex format

@inbook{Amiriparian:2020:10.1007/978-3-030-42750-4_5,
author = {Amiriparian, S and Schmitt, M and Ottl, S and Gerczuk, M and Schuller, B},
booktitle = {Intelligent Systems Reference Library},
doi = {10.1007/978-3-030-42750-4_5},
pages = {137--164},
title = {Deep unsupervised representation learning for audio-based medical applications},
url = {http://dx.doi.org/10.1007/978-3-030-42750-4_5},
year = {2020}
}

Download

RIS format (EndNote, RefMan)

TY  - CHAP
AB  - Feature learning denotes a set of approaches for transforming raw input data into representations that can be effectively utilised in solving machine learning problems. Classifiers or regressors require training data which is computationally suitable to process. However, real-world data, e.g., an audio recording from a group of people talking in a park whilst in the background a dog is barking and a musician is playing the guitar, or health-related data such as coughing and sneezing recorded by consumer smartphones, comprises a remarkably variable and complex nature. For understanding such data, developing expert-designed, hand-crafted features often demands for an exhaustive amount of time and resources. Another disadvantage of such features is the lack of generalisation, i.e., there is a need for re-engineering new features for new tasks. Therefore, it is inevitable to develop automatic representation learning methods. In this chapter, we first discuss the preliminaries of contemporary representation learning techniques for computer audition tasks. Hereby, we differentiate between approaches based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). We then introduce and evaluate three state-of-the-art deep learning systems for unsupervised representation learning from raw audio: (1) pre-trained image classification CNNs, (2) a deep convolutional generative adversarial network (DCGAN), and (3) a recurrent sequence-to-sequence autoencoder (S2SAE). For each of these algorithms, the representations are obtained from the spectrograms of the input audio data. Finally, for a range of audio-based machine learning tasks, including abnormal heart sound classification, snore sound classification, and bipolar disorder recognition, we evaluate the efficacy of the deep representations, which are: (i) the activations of the fully connected layers of the pre-trained CNNs, (ii) the activations of the discriminator in case of the DCGAN, and (iii) the activ
AU  - Amiriparian,S
AU  - Schmitt,M
AU  - Ottl,S
AU  - Gerczuk,M
AU  - Schuller,B
DO  - 10.1007/978-3-030-42750-4_5
EP  - 164
PY  - 2020///
SP  - 137
TI  - Deep unsupervised representation learning for audio-based medical applications
T1  - Intelligent Systems Reference Library
UR  - http://dx.doi.org/10.1007/978-3-030-42750-4_5
ER  -

Download

ProfessorBjoernSchuller

Contact

Location

Summary

Citation

BibTex format

RIS format (EndNote, RefMan)