Publications

Journal article

Steadman M, Kim C, Lestang J-H, Goodman D, Picinali Let al., 2019,

Short-term effects of sound localization training in virtual reality

, Scientific Reports, Vol: 9, ISSN: 2045-2322

Head-related transfer functions (HRTFs) capture the direction-dependant way that sound interacts with the head and torso. In virtual audio systems, which aim to emulate these effects, non-individualized, generic HRTFs are typically used leading to an inaccurate perception of virtual sound location. Training has the potential to exploit the brain’s ability to adapt to these unfamiliar cues. In this study, three virtual sound localization training paradigms were evaluated; one provided simple visual positional confirmation of sound source location, a second introduced game design elements (“gamification”) and a final version additionally utilized head-tracking to provide listeners with experience of relative sound source motion (“active listening”). The results demonstrate a significant effect of training after a small number of short (12-minute) training sessions, which is retained across multiple days. Gamification alone had no significant effect on the efficacy of the training, but active listening resulted in a significantly greater improvements in localization accuracy. In general, improvements in virtual sound localization following training generalized to a second set of non-individualized HRTFs, although some HRTF-specific changes were observed in polar angle judgement for the active listening group. The implications of this on the putative mechanisms of the adaptation process are discussed.

Conference paper

Neo V, Evers C, Naylor P, 2019,

Speech enhancement using polynomial eigenvalue decomposition

, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Publisher: IEEE

Speech enhancement is important for applications such as telecommunications, hearing aids, automatic speech recognition and voice-controlled system. The enhancement algorithms aim to reduce interfering noise while minimizing any speech distortion. In this work for speech enhancement, we propose to use polynomial matrices in order to exploit the spatial, spectral as well as temporal correlations between the speech signals received by the microphone array. Polynomial matrices provide the necessary mathematical framework in order to exploit constructively the spatial correlations within and between sensor pairs, as well as the spectral-temporal correlations of broadband signals, such as speech. Specifically, the polynomial eigenvalue decomposition (PEVD) decorrelates simultaneously in space, time and frequency. We then propose a PEVD-based speech enhancement algorithm. Simulations and informal listening examples have shown that our method achieves noise reduction without introducing artefacts into the enhanced signal for white, babble and factory noise conditions between -10 dB to 30 dB SNR.

Conference paper

Hogg AOT, Evers C, Naylor PA, 2019,

Multiple Hypothesis Tracking for Overlapping Speaker Segmentation

, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE

Conference paper

Sharma D, Hogg AOT, Wang Y, Nour-Eldin A, Naylor PAet al., 2019,

Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks

, 2019 27th European Signal Processing Conference (EUSIPCO), Publisher: IEEE

Conference paper

Neo V, Naylor PA, 2019,

Second order sequential best rotation algorithm with householder reduction for polynomial matrix eigenvalue decomposition

, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 8043-8047, ISSN: 0736-7791

The Second-order Sequential Best Rotation (SBR2) algorithm, usedfor Eigenvalue Decomposition (EVD) on para-Hermitian polynomialmatrices typically encountered in wideband signal processingapplications like multichannel Wiener filtering and channel coding,involves a series of delay and rotation operations to achieve diagonalisation.In this paper, we proposed the use of Householder transformationsto reduce polynomial matrices to tridiagonal form beforezeroing the dominant element with rotation. Similar to performingHouseholder reduction on conventional matrices, our methodenables SBR2 to converge in fewer iterations with smaller orderof polynomial matrix factors because more off-diagonal Frobeniusnorm(F-norm) could be transferred to the main diagonal at everyiteration. A reduction in the number of iterations by 12.35% and0.1% improvement in reconstruction error is achievable.

Conference paper

Hogg AOT, Evers C, Naylor PA, 2019,

Speaker Change Detection Using Fundamental Frequency with Application to Multi-talker Segmentation

, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE

Journal article

Stitt P, Picinali L, Katz BFG, 2019,

Auditory accommodation to poorly matched non-individual spectral localization cues through active learning

, Scientific Reports, Vol: 9, Pages: 1-14, ISSN: 2045-2322

This study examines the effect of adaptation to non-ideal auditory localization cues represented by the Head-Related Transfer Function (HRTF) and the retention of training for up to three months after the last session. Continuing from a previous study on rapid non-individual HRTF learning, subjects using non-individual HRTFs were tested alongside control subjects using their own measured HRTFs. Perceptually worst-rated non-individual HRTFs were chosen to represent the worst-case scenario in practice and to allow for maximum potential for improvement. The methodology consisted of a training game and a localization test to evaluate performance carried out over 10 sessions. Sessions 1–4 occurred at 1 week intervals, performed by all subjects. During initial sessions, subjects showed improvement in localization performance for polar error. Following this, half of the subjects stopped the training game element, continuing with only the localization task. The group that continued to train showed improvement, with 3 of 8 subjects achieving group mean polar errors comparable to the control group. The majority of the group that stopped the training game retained their performance attained at the end of session 4. In general, adaptation was found to be quite subject dependent, highlighting the limits of HRTF adaptation in the case of poor HRTF matches. No identifier to predict learning ability was observed.

Conference paper

Moore A, de Haan JM, Pedersen MS, Naylor P, Brookes D, Jensen Jet al., 2019,

Personalized {HRTF}s for hearing aids

, ELOBES2019

Cite

Conference paper

Cuevas-Rodriguez M, Gonzalez-Toledo D, La Rubia-Cuestas ED, Garre C, Molina-Tanco L, Reyes-Lecuona A, Poirier-Quinot D, Picinali Let al., 2018,

The 3D Tune-In Toolkit - 3D audio spatialiser, hearing loss and hearing aid simulations

The 3DTI Toolkit is a standard C++ library for audio spatialisation and simulation using loudspeakers or headphones developed within the 3D Tune-In (3DTI) project (http://www.3d-tune-in.eu), which aims at using 3D sound and simulating hearing loss and hearing aids within virtual environments and games. The Toolkit allows the design and rendering of highly realistic and immersive 3D audio, and the simulation of virtual hearing aid devices and of different typologies of hearing loss. The library includes a real-time 3D binaural audio renderer offering full 3D spatialization based on efficient Head Related Transfer Function (HRTF) convolution, including smooth interpolation among impulse responses, customization of listener head radius and specific simulation of far-distance and near-field effects. In addition, spatial reverberation is simulated in real time using a uniformly partitioned convolution with Binaural Room Impulse Responses (BRIRs) employing a virtual Ambisonic approach. The 3D Tune-In Toolkit includes also a loudspeaker-based spatialiser implemented using Ambisonic encoding/decoding. This poster presents a brief overview of the main features of the Toolkit, which is released open-source under GPL v3 license (the code is available in GitHub https://github.com/3DTune-In/3dti-AudioToolkit).

Abstract
Cite
Citations: 6

Journal article

Braiman C, Fridman A, Conte MM, Vosse HU, Reichenbach CS, Reichenbach J, Schiff NDet al., 2018,

Cortical response to the natural speech envelope correlates with neuroimaging evidence of cognition in severe brain injury

, Current Biology, Vol: 28, Pages: 3833-3839.E3, ISSN: 0960-9822

Recent studies identify severely brain-injured patients with limited or no behavioral responses who successfully perform functional magnetic resonance imaging (fMRI) or electroencephalogram (EEG) mental imagery tasks [1, 2, 3, 4, 5]. Such tasks are cognitively demanding [1]; accordingly, recent studies support that fMRI command following in brain-injured patients associates with preserved cerebral metabolism and preserved sleep-wake EEG [5, 6]. We investigated the use of an EEG response that tracks the natural speech envelope (NSE) of spoken language [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22] in healthy controls and brain-injured patients (vegetative state to emergence from minimally conscious state). As audition is typically preserved after brain injury, auditory paradigms may be preferred in searching for covert cognitive function [23, 24, 25]. NSE measures are obtained by cross-correlating EEG with the NSE. We compared NSE latencies and amplitudes with and without consideration of fMRI assessments. NSE latencies showed significant and progressive delay across diagnostic categories. Patients who could carry out fMRI-based mental imagery tasks showed no statistically significant difference in NSE latencies relative to healthy controls; this subgroup included patients without behavioral command following. The NSE may stratify patients with severe brain injuries and identify those patients demonstrating “cognitive motor dissociation” (CMD) [26] who show only covert evidence of command following utilizing neuroimaging or electrophysiological methods that demand high levels of cognitive function. Thus, the NSE is a passive measure that may provide a useful screening tool to improve detection of covert cognition with fMRI or other methods and improve stratification of patients with disorders of consciousness in research studies.

Journal article

Sethi S, Ewers R, Jones N, Orme D, Picinali Let al., 2018,

Robust, real-time and autonomous monitoring of ecosystems with an open, low-cost, networked device

, Methods in Ecology and Evolution, Vol: 9, Pages: 2383-2387, ISSN: 2041-210X

1. Automated methods of monitoring ecosystems provide a cost-effective way to track changes in natural system's dynamics across temporal and spatial scales. However, methods of recording and storing data captured from the field still require significant manual effort. 2. Here we introduce an open source, inexpensive, fully autonomous ecosystem monitoring unit for capturing and remotely transmitting continuous data streams from field sites over long time-periods. We provide a modular software framework for deploying various sensors, together with implementations to demonstrate proof of concept for continuous audio monitoring and time-lapse photography. 3. We show how our system can outperform comparable technologies for fractions of the cost, provided a local mobile network link is available. The system is robust to unreliable network signals and has been shown to function in extreme environmental conditions, such as in the tropical rainforests of Sabah, Borneo. 4. We provide full details on how to assemble the hardware, and the open-source software. Paired with appropriate automated analysis techniques, this system could provide spatially dense, near real-time, continuous insights into ecosystem and biodiversity dynamics at a low cost.

Conference paper

Moore AH, Lightburn L, Xue W, Naylor P, Brookes Det al., 2018,

Binaural mask-informed speech enhancement for hearing aids with head tracking

, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEE, Pages: 461-465

An end-to-end speech enhancement system for hearing aids is pro-posed which seeks to improve the intelligibility of binaural speechin noise during head movement. The system uses a reference beam-former whose look direction is informed by knowledge of the headorientation and the a priori known direction of the desired source.From this a time-frequency mask is estimated using a deep neuralnetwork. The binaural signals are obtained using bilateral beam-formers followed by a classical minimum mean square error speechenhancer, modified to use the estimated mask as a speech presenceprobability prior. In simulated experiments, the improvement in abinaural intelligibility metric (DBSTOI) given by the proposed sys-tem relative to beamforming alone corresponds to an SNR improve-ment of 4 to 6 dB. Results also demonstrate the individual contribu-tions of incorporating the mask and the head orientation-aware beamsteering to the proposed system.

Conference paper

Dawson PJ, De Sena E, Naylor PA, 2018,

An acoustic image-source characterisation of surface profiles

, 2018 26th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 2130-2134, ISSN: 2076-1465

The image-source method models the specular reflection from a plane by means of a secondary source positioned at the source's reflected image. The method has been widely used in acoustics to model the reverberant field of rectangular rooms, but can also be used for general-shaped rooms and non-flat reflectors. This paper explores the relationship between the physical properties of a non-flat reflector and the statistical properties of the associated cloud of image-sources. It is shown here that the standard deviation of the image-sources is strongly correlated with the ratio between depth and width of the reflector's spatial features.

Conference paper

Kim C, Steadman M, Lestang JH, Goodman DFM, Picinali Let al., 2018,

A VR-based mobile platform for training to non-individualized binaural 3D audio

, 144th Audio Engineering Society Convention 2018

© 2018 Audio Engineering Society. All Rights Reserved. Delivery of immersive 3D audio with arbitrarily-positioned sound sources over headphones often requires processing of individual source signals through a set of Head-Related Transfer Functions (HRTFs). The individual morphological differences and the impracticality of HRTF measurement make it difficult to deliver completely individualized 3D audio, and instead lead to the use of previously-measured non-individual sets of HRTFs. In this study, a VR-based mobile sound localization training prototype system is introduced which uses HRTF sets for audio. It consists of a mobile phone as a head-mounted device, a hand-held Bluetooth controller, and a network-enabled laptop with a USB audio interface and a pair of headphones. The virtual environment was developed on the mobile phone such that the user can listen-to/navigate-in an acoustically neutral scene and locate invisible target sound sources presented at random directions using non-individualized HRTFs in repetitive sessions. Various training paradigms can be designed with this system, with performance-related feedback provided according to the user’s localization accuracy, including visual indication of the target location, and some aspects of a typical first-person shooting game, such as enemies, scoring, and level advancement. An experiment was conducted using this system, in which 11 subjects went through multiple training sessions, using non-individualized HRTF sets. The localization performance evaluations showed reduction of overall localization angle error over repeated training sessions, reflecting lower front-back confusion rates.

Abstract
Cite

Conference paper

Kegler M, Etard OE, Forte AE, Reichenbach JDTet al., 2018,

Complex Statistical Model for Detecting the Auditory Brainstem Response to Natural Speech and for Decoding Attention from High-Density EEG Recordings

, ARO 2018

Cite

Conference paper

Saiz Alia M, Askari A, Forte AE, Reichenbach JDTet al., 2018,

A model of the human auditory brainstem response to running speech

, ARO 2018

Cite

Conference paper

Forte AE, Etard OE, Reichenbach JDT, 2018,

Selective Auditory Attention At The Brainstem Level

, ARO 2018

Cite

Conference paper

Lim V, Frangakis N, Molina Tanco L, Picinali Let al., 2018,

PLUGGY: A Pluggable Social Platform for Cultural Heritage Awareness and Participation

, International Workshop on Analysis in Digital Cultural Heritage 2017

Cite

Journal article

Evers C, Naylor PA, 2018,

Optimized self-localization for SLAM in dynamic scenes using probability hypothesis density filters

, IEEE Transactions on Signal Processing, Vol: 66, Pages: 863-878, ISSN: 1053-587X

In many applications, sensors that map the positions of objects in unknown environments are installed on dynamic platforms. As measurements are relative to the observer's sensors, scene mapping requires accurate knowledge of the observer state. However, in practice, observer reports are subject to positioning errors. Simultaneous localization and mapping addresses the joint estimation problem of observer localization and scene mapping. State-of-the-art approaches typically use visual or optical sensors and therefore rely on static beacons in the environment to anchor the observer estimate. However, many applications involving sensors that are not conventionally used for Simultaneous Localization and Mapping (SLAM) are affected by highly dynamic scenes, such that the static world assumption is invalid. This paper proposes a novel approach for dynamic scenes, called GEneralized Motion (GEM) SLAM. Based on probability hypothesis density filters, the proposed approach probabilistically anchors the observer state by fusing observer information inferred from the scene with reports of the observer motion. This paper derives the general, theoretical framework for GEM-SLAM, and shows that it generalizes existing Probability Hypothesis Density (PHD)-based SLAM algorithms. Simulations for a model-specific realization using range-bearing sensors and multiple moving objects highlight that GEM-SLAM achieves significant improvements over three benchmark algorithms.

Journal article

Etard OE, Kegler M, Braiman C, Forte AE, Reichenbach JDTet al., 2018,

Real-time decoding of selective attention from the human auditory brainstem response to continuous speech

, BioRxiv

Cite

Imperial College London

Latest News

Natural and Machine Hearing

Short-term effects of sound localization training in virtual reality

Speech enhancement using polynomial eigenvalue decomposition

Multiple Hypothesis Tracking for Overlapping Speaker Segmentation

Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks

Second order sequential best rotation algorithm with householder reduction for polynomial matrix eigenvalue decomposition

Speaker Change Detection Using Fundamental Frequency with Application to Multi-talker Segmentation

Auditory accommodation to poorly matched non-individual spectral localization cues through active learning

Personalized {HRTF}s for hearing aids

The 3D Tune-In Toolkit - 3D audio spatialiser, hearing loss and hearing aid simulations

Cortical response to the natural speech envelope correlates with neuroimaging evidence of cognition in severe brain injury

Robust, real-time and autonomous monitoring of ecosystems with an open, low-cost, networked device

Binaural mask-informed speech enhancement for hearing aids with head tracking

An acoustic image-source characterisation of surface profiles

A VR-based mobile platform for training to non-individualized binaural 3D audio

Complex Statistical Model for Detecting the Auditory Brainstem Response to Natural Speech and for Decoding Attention from High-Density EEG Recordings

A model of the human auditory brainstem response to running speech

Selective Auditory Attention At The Brainstem Level

PLUGGY: A Pluggable Social Platform for Cultural Heritage Awareness and Participation

Optimized self-localization for SLAM in dynamic scenes using probability hypothesis density filters

Real-time decoding of selective attention from the human auditory brainstem response to continuous speech

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Personalized {HRTF}s for hearing aids

A VR-based mobile platform for training to non-individualized binaural 3D audio

Complex Statistical Model for Detecting the Auditory Brainstem Response to Natural Speech and for Decoding Attention from High-Density EEG Recordings

A model of the human auditory brainstem response to running speech

Selective Auditory Attention At The Brainstem Level

PLUGGY: A Pluggable Social Platform for Cultural Heritage Awareness and Participation