Imperial College London

Patrick A. Naylor

Faculty of EngineeringDepartment of Electrical and Electronic Engineering

Professor of Speech & Acoustic Signal Processing
 
 
 
//

Contact

 

+44 (0)20 7594 6235p.naylor Website

 
 
//

Location

 

803Electrical EngineeringSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{D'Olne:2024:10.1109/OJSP.2023.3344379,
author = {D'Olne, E and Moore, AH and Naylor, PA and Donley, J and Tourbabin, V and Lunner, T},
doi = {10.1109/OJSP.2023.3344379},
journal = {IEEE Open Journal of Signal Processing},
pages = {374--382},
title = {Group Conversations in Noisy Environments (GiN) - Multimedia Recordings for Location-Aware Speech Enhancement},
url = {http://dx.doi.org/10.1109/OJSP.2023.3344379},
volume = {5},
year = {2024}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - Recent years have seen a growing interest in the use of smart glasses mounted with microphones to solve the cocktail party problem using beamforming techniques or machine learning. Many such approaches could bring substantial advances in hearing aid or Augmented Reality (AR) research. To validate these methods, the EasyCom [Donley et al., 2021] dataset introduced high-quality multi-modal recordings of conversations in noise, including egocentric multi-channel microphone array audio, speech source pose, and headset microphone audio. While providing comprehensive data, EasyCom lacks diversity in the acoustic environments considered and the degree of overlapping speech in conversations. This work therefore presents the Group in Noise (GiN) dataset of over 2 hours of group conversations in noisy environments recorded using binaural microphones and a pair of glasses mounted with 5 microphones. The recordings took place in 3 rooms and contain 6 seated participants as well as a standing facilitator. The data also include close-talking microphone audio and head-pose data for each speaker, an audio channel from a fixed reference microphone, and automatically annotated speaker activity information. A baseline method is used to demonstrate the use of the data for speech enhancement. The dataset is publicly available in d'Olne et al. [2023].
AU - D'Olne,E
AU - Moore,AH
AU - Naylor,PA
AU - Donley,J
AU - Tourbabin,V
AU - Lunner,T
DO - 10.1109/OJSP.2023.3344379
EP - 382
PY - 2024///
SP - 374
TI - Group Conversations in Noisy Environments (GiN) - Multimedia Recordings for Location-Aware Speech Enhancement
T2 - IEEE Open Journal of Signal Processing
UR - http://dx.doi.org/10.1109/OJSP.2023.3344379
VL - 5
ER -