Publications

Journal article

Reichenbach JDT, Ciganovic N, Warren R, Keceli B, Jacon S, Fridberger Aet al., 2018,

Static length changes of cochlear outer hair cells can tune low-frequency hearing

, PLoS Computational Biology, Vol: 14, ISSN: 1553-734X

The cochlea not only transduces sound-induced vibration into neural spikes, it also amplifiesweak sound to boost its detection. Actuators of this active process are sensory outer haircells in the organ of Corti, whereas the inner hair cells transduce the resulting motion intoelectric signals that propagate via the auditory nerve to the brain. However, how the outerhair cells modulate the stimulus to the inner hair cells remains unclear. Here, we combinetheoretical modeling and experimental measurements near the cochlear apex to study theway in which length changes of the outer hair cells deform the organ of Corti. We develop ageometry-based kinematic model of the apical organ of Corti that reproduces salient, yetcounter-intuitive features of the organ’s motion. Our analysis further uncovers a mechanismby which a static length change of the outer hair cells can sensitively tune the signal transmittedto the sensory inner hair cells. When the outer hair cells are in an elongated state,stimulation of inner hair cells is largely inhibited, whereas outer hair cell contraction leads toa substantial enhancement of sound-evoked motion near the hair bundles. This novel mechanismfor regulating the sensitivity of the hearing organ applies to the low frequencies thatare most important for the perception of speech and music. We suggest that the proposedmechanism might underlie frequency discrimination at low auditory frequencies, as well asour ability to selectively attend auditory signals in noisy surroundings.

Journal article

Dietz M, Lestang J-H, Majdak P, Stern RM, Marquardt T, Ewert SD, Hartmann WM, Goodman DFMet al., 2017,

A framework for testing and comparing binaural models

, Hearing Research, Vol: 360, Pages: 92-106, ISSN: 0378-5955

Auditory research has a rich history of combining experimental evidence with computational simulations of auditory processing in order to deepen our theoretical understanding of how sound is processed in the ears and in the brain. Despite significant progress in the amount of detail and breadth covered by auditory models, for many components of the auditory pathway there are still different model approaches that are often not equivalent but rather in conflict with each other. Similarly, some experimental studies yield conflicting results which has led to controversies. This can be best resolved by a systematic comparison of multiple experimental data sets and model approaches. Binaural processing is a prominent example of how the development of quantitative theories can advance our understanding of the phenomena, but there remain several unresolved questions for which competing model approaches exist. This article discusses a number of current unresolved or disputed issues in binaural modelling, as well as some of the significant challenges in comparing binaural models with each other and with the experimental data. We introduce an auditory model framework, which we believe can become a useful infrastructure for resolving some of the current controversies. It operates models over the same paradigms that are used experimentally. The core of the proposed framework is an interface that connects three components irrespective of their underlying programming language: The experiment software, an auditory pathway model, and task-dependent decision stages called artificial observers that provide the same output format as the test subject.

Conference paper

Papayiannis C, Evers C, Naylor PA, 2017,

Sparse parametric modeling of the early part of acoustic impulse responses

, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 678-682, ISSN: 2076-1465

Acoustic channels are typically described by their Acoustic Impulse Response (AIR) as a Moving Average (MA) process. Such AIRs are often considered in terms of their early and late parts, describing discrete reflections and the diffuse reverberation tail respectively. We propose an approach for constructing a sparse parametric model for the early part. The model aims at reducing the number of parameters needed to represent it and subsequently reconstruct from the representation the MA coefficients that describe it. It consists of a representation of the reflections arriving at the receiver as delayed copies of an excitation signal. The Time-Of-Arrivals of reflections are not restricted to integer sample instances and a dynamically estimated model for the excitation sound is used. We also present a corresponding parameter estimation method, which is based on regularized-regression and nonlinear optimization. The proposed method also serves as an analysis tool, since estimated parameters can be used for the estimation of room geometry, the mixing time and other channel properties. Experiments involving simulated and measured AIRs are presented, in which the AIR coefficient reconstruction-error energy does not exceed 11.4% of the energy of the original AIR coefficients. The results also indicate dimensionality reduction figures exceeding 90% when compared to a MA process representation.

Journal article

Forte AE, Etard O, Reichenbach J, 2017,

The human auditory brainstem response to running speech reveals a subcortical mechanism for selective attention

, eLife, Vol: 6, ISSN: 2050-084X

Humans excel at selectively listening to a target speaker in background noise such as competing voices. While the encoding of speech in the auditory cortex is modulated by selective attention, it remains debated whether such modulation occurs already in subcortical auditory structures. Investigating the contribution of the human brainstem to attention has, in particular, been hindered by the tiny amplitude of the brainstem response. Its measurement normally requires a large number of repetitions of the same short sound stimuli, which may lead to a loss of attention and to neural adaptation. Here we develop a mathematical method to measure the auditory brainstem response to running speech, an acoustic stimulus that does not repeat and that has a high ecological validity. We employ this method to assess the brainstem's activity when a subject listens to one of two competing speakers, and show that the brainstem response is consistently modulated by attention.

Journal article

Goodman DFM, Winter IM, Léger AC, de Cheveigné A, Lorenzi Cet al., 2017,

Modelling firing regularity in the ventral cochlear nucleus: Mechanisms, and effects of stimulus level and synaptopathy

, Hearing Research, Vol: 358, Pages: 98-110, ISSN: 0378-5955

The auditory system processes temporal information at multiple scales, and disruptions to this temporal processing may lead to deficits in auditory tasks such as detecting and discriminating sounds in a noisy environment. Here, a modelling approach is used to study the temporal regularity of firing by chopper cells in the ventral cochlear nucleus, in both the normal and impaired auditory system. Chopper cells, which have a strikingly regular firing response, divide into two classes, sustained and transient, based on the time course of this regularity. Several hypotheses have been proposed to explain the behaviour of chopper cells, and the difference between sustained and transient cells in particular. However, there is no conclusive evidence so far. Here, a reduced mathematical model is developed and used to compare and test a wide range of hypotheses with a limited number of parameters. Simulation results show a continuum of cell types and behaviours: chopper-like behaviour arises for a wide range of parameters, suggesting that multiple mechanisms may underlie this behaviour. The model accounts for systematic trends in regularity as a function of stimulus level that have previously only been reported anecdotally. Finally, the model is used to predict the effects of a reduction in the number of auditory nerve fibres (deafferentation due to, for example, cochlear synaptopathy). An interactive version of this paper in which all the model parameters can be changed is available online.

Conference paper

Forte AE, Etard O, Reichenbach J, 2017,

Selective auditory attention modulates the human brainstem's response to running speech

, Basic Auditory Science 2017

Cite

Conference paper

Kegler M, Etard O, Forte AE, Reichenbach Jet al., 2017,

Complex statistical model for detecting the auditory brainstem response to natural speech and for decoding attention

, Basic Auditory Science 2017

Cite

Conference paper

Isaac Engel J, Picinali L, 2017,

Long-term user adaptation to an audio augmented reality system

, SOUND AND VIBRATION. INTERNATIONAL CONGRESS. 24TH 2017

Audio Augmented Reality (AAR) consists in extending a real auditory environment with virtual sound sources. This can be achieved using binaural earphones/microphones. The microphones, placed in the outer part of each earphone, record sounds from the user's environment, which are then mixed with virtual binaural audio, and the resulting signal is finally played back through the earphones. However, previous studies show that, with a system of this type, audio coming from the microphones (or hear-through audio) does not sound natural to the user. The goal of this study is to explore the capabilities of long-term user adaptation to an AAR system built with off-the-shelf components (a pair of binaural microphones/earphones and a smartphone), aiming at achieve perceived realism for the hear-through audio. To compensate the acoustical effects of ear canal occlusion, the recorded signal is equalised in the smartphone. In-out latency was minimised to avoid distortion caused by comb filtering effect. To evaluate the adaptation process of the users to the headset, two case studies were performed. The subjects wore an AAR headset for several days while performing daily tests to check the progress of the adaptation. Both quantitative and qualitative evaluations (i.e., localising real and virtual sound sources and analysing the perception of pre-recorded auditory scenes) were carried out, finding slight signs of adaptation, especially in the subjective tests. A demo will be available for the conference visitors, including also the integration of visual Augmented Reality functionalities.

Abstract
Cite

Conference paper

Parada PP, Sharma D, van Waterschoot T, Naylor PAet al., 2017,

Robust Statistical Processing of TDOA Estimates for Distant Speaker Diarization

, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 86-90, ISSN: 2076-1465

Conference paper

Etard, Reichenbach J, 2017,

EEG-measured correlates of comprehension in speech-in-noise listening

, Basic Auditory Science 2017

Cite

Journal article

Sidiras C, Iliadou V, Nimatoudis I, Reichenbach T, Bamiou D-Eet al., 2017,

Spoken word recognition enhancement due to preceding synchronized beats compared to unsynchronized or unrhythmic beats

, Frontiers in Neuroscience, Vol: 11, ISSN: 1662-4548

The relation between rhythm and language has been investigated over the last decades, with evidence that these share overlapping perceptual mechanisms emerging from several different strands of research. The dynamic Attention Theory posits that neural entrainment to musical rhythm results in synchronized oscillations in attention, enhancing perception of other events occurring at the same rate. In this study, this prediction was tested in 10 year-old children by means of a psychoacoustic speech recognition in babble paradigm. It was hypothesized that rhythm effects evoked via a short isochronous sequence of beats would provide optimal word recognition in babble when beats and word are in sync. We compared speech recognition in babble performance in the presence of isochronous and in sync vs. non-isochronous or out of sync sequence of beats. Results showed that (a) word recognition was the best when rhythm and word were in sync, and (b) the effect was not uniform across syllables and gender of subjects. Our results suggest that pure tone beats affect speech recognition at early levels of sensory or phonemic processing.

Conference paper

Evers C, Dorfan Y, Gannot S, Naylor PAet al., 2017,

Source tracking using moving microphone arrays for robot audition

, IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE

Intuitive spoken dialogues are a prerequisite for human-robot inter-action. In many practical situations, robots must be able to identifyand focus on sources of interest in the presence of interfering speak-ers. Techniques such as spatial filtering and blind source separa-tion are therefore often used, but rely on accurate knowledge of thesource location. In practice, sound emitted in enclosed environmentsis subject to reverberation and noise. Hence, sound source localiza-tion must be robust to both diffuse noise due to late reverberation, aswell as spurious detections due to early reflections. For improvedrobustness against reverberation, this paper proposes a novel ap-proach for sound source tracking that constructively exploits the spa-tial diversity of a microphone array installed in a moving robot. Inprevious work, we developed speaker localization approaches usingexpectation-maximization (EM) approaches and using Bayesian ap-proaches. In this paper we propose to combine the EM and Bayesianapproach in one framework for improved robustness against rever-beration and noise.

Conference paper

Lightburn L, De Sena E, Moore AH, Naylor PA, Brookes Det al., 2017,

Improving the perceptual quality of ideal binary masked speech

, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 661-665, ISSN: 1520-6149

It is known that applying a time-frequency binary mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a binary mask that combines the intelligibility gains of conventional binary masking with the perceptual quality gains of a classical speech enhancer. The binary mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the mask is used to supply prior information to a classical speech enhancer about the probability of speech presence in different time-frequency regions. Using an oracle ideal binary mask, we show that the proposed method results in a higher predicted quality than other methods of applying a binary mask whilst preserving the improvements in predicted intelligibility.

Journal article

Ciganovic N, Wolde-Kidan A, Reichenbach JDT, 2017,

Hair bundles of cochlear outer hair cells are shaped to minimize their fluid-dynamic resistance

, Scientific Reports, Vol: 7, ISSN: 2045-2322

The mammalian sense of hearing relies on two types of sensory cells: inner hair cells transmit the auditory stimulus to the brain, while outer hair cells mechanically modulate the stimulus through active feedback. Stimulation of a hair cell is mediated by displacements of its mechanosensitive hair bundle which protrudes from the apical surface of the cell into a narrow fluid-filled space between reticular lamina and tectorial membrane. While hair bundles of inner hair cells are of linear shape, those of outer hair cells exhibit a distinctive V-shape. The biophysical rationale behind this morphology, however, remains unknown. Here we use analytical and computational methods to study the fluid flow across rows of differently shaped hair bundles. We find that rows of V-shaped hair bundles have a considerably reduced resistance to crossflow, and that the biologically observed shapes of hair bundles of outer hair cells are near-optimal in this regard. This observation accords with the function of outer hair cells and lends support to the recent hypothesis that inner hair cells are stimulated by a net flow, in addition to the well-established shear flow that arises from shearing between the reticular lamina and the tectorial membrane.

Conference paper

Picinali L, Wallin A, Levtov Y, Poirier-Quinot Det al., 2017,

Comparative perceptual evaluation between different methods for implementing Reverberation in a binaural context

, AES 2017, Publisher: Audio Engineering Society

Reverberation has always been considered of primary importance in order to improve the realism, externalisation and immersiveness of binaurally spatialised sounds. Different techniques exist for implementing reverberation in a binaural context, each with a different level of computational complexity and spatial accuracy. A perceptual study has been performed in order to compare between the realism and localization accuracy achieved using 5 different binaural reverberation techniques. These included multichannel Ambisonic-based, stereo and mono reverberation methods. A custom web-based application has been developed implementing the testing procedures, and allowing participants to take the test remotely. Initial results with 54 participants show that no major difference in terms of perceived level of realism and spatialisation accuracy could be found between four of the five proposed reverberation methods, suggesting that a high level of complexity in the reverberation process does not always correspond to improved perceptual attributes.

Journal article

Doire CSJ, Brookes DM, Naylor PA, 2017,

Robust and efficient Bayesian adaptive psychometric function estimation

, Journal of the Acoustical Society of America, Vol: 141, Pages: 2501-2512, ISSN: 0001-4966

The efficient measurement of the threshold and slope of the psychometric function (PF) is an important objective in psychoacoustics. This paper proposes a procedure that combines a Bayesian estimate of the PF with either a look one-ahead or a look two-ahead method of selecting the next stimulus presentation. The procedure differs from previously proposed algorithms in two respects: (i) it does not require the range of possible PF parameters to be specified in advance and (ii) the sequence of probe signal-to-noise ratios optimizes the threshold and slope estimates at a performance level, ϕ, that can be chosen by the experimenter. Simulation results show that the proposed procedure is robust and that the estimates of both threshold and slope have a consistently low bias. Over a wide range of listener PF parameters, the root-mean-square errors after 50 trials were ∼1.2 dB in threshold and 0.14 in log-slope. It was found that the performance differences between the look one-ahead and look two-ahead methods were negligible and that an entropy-based criterion for selecting the next stimulus was preferred to a variance-based criterion.

Conference paper

Pinero G, Naylor PA, 2017,

CHANNEL ESTIMATION FOR CROSSTALK CANCELLATION IN WlRELESS ACOUSTIC NETWORKS

, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 586-590, ISSN: 1520-6149

Conference paper

Javed HA, Cauchi B, Doclo S, Naylor PA, Goetze Set al., 2017,

MEASURING, MODELLING AND PREDICTING PERCEIVED REVERBERATION

, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 381-385, ISSN: 1520-6149

Conference paper

Forte AE, Etard O, Reichenbach J, 2017,

Complex Auditory-brainstem Response to the Fundamental Frequency of Continuous Natural Speech

, ARO 2017

Cite

Book

Jarrett DP, Habets EAP, Naylor PA, 2017,

Theory and Applications of Spherical Microphone Array Processing Introduction

, Publisher: SPRINGER-VERLAG BERLIN, ISBN: 978-3-319-42209-1

Imperial College London

Latest News

Natural and Machine Hearing

Static length changes of cochlear outer hair cells can tune low-frequency hearing

A framework for testing and comparing binaural models

Sparse parametric modeling of the early part of acoustic impulse responses

The human auditory brainstem response to running speech reveals a subcortical mechanism for selective attention

Modelling firing regularity in the ventral cochlear nucleus: Mechanisms, and effects of stimulus level and synaptopathy

Selective auditory attention modulates the human brainstem's response to running speech

Complex statistical model for detecting the auditory brainstem response to natural speech and for decoding attention

Long-term user adaptation to an audio augmented reality system

Robust Statistical Processing of TDOA Estimates for Distant Speaker Diarization

EEG-measured correlates of comprehension in speech-in-noise listening

Spoken word recognition enhancement due to preceding synchronized beats compared to unsynchronized or unrhythmic beats

Source tracking using moving microphone arrays for robot audition

Improving the perceptual quality of ideal binary masked speech

Hair bundles of cochlear outer hair cells are shaped to minimize their fluid-dynamic resistance

Comparative perceptual evaluation between different methods for implementing Reverberation in a binaural context

Robust and efficient Bayesian adaptive psychometric function estimation

CHANNEL ESTIMATION FOR CROSSTALK CANCELLATION IN WlRELESS ACOUSTIC NETWORKS

MEASURING, MODELLING AND PREDICTING PERCEIVED REVERBERATION

Complex Auditory-brainstem Response to the Fundamental Frequency of Continuous Natural Speech

Theory and Applications of Spherical Microphone Array Processing Introduction

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Selective auditory attention modulates the human brainstem's response to running speech

Complex statistical model for detecting the auditory brainstem response to natural speech and for decoding attention

Long-term user adaptation to an audio augmented reality system

Robust Statistical Processing of TDOA Estimates for Distant Speaker Diarization

EEG-measured correlates of comprehension in speech-in-noise listening

Comparative perceptual evaluation between different methods for implementing Reverberation in a binaural context

CHANNEL ESTIMATION FOR CROSSTALK CANCELLATION IN WlRELESS ACOUSTIC NETWORKS

MEASURING, MODELLING AND PREDICTING PERCEIVED REVERBERATION

Complex Auditory-brainstem Response to the Fundamental Frequency of Continuous Natural Speech