Publications

Conference paper

Evers C, Moore A, Naylor P, 2016,

Localization of Moving Microphone Arrays from Moving Sound Sources for Robot Audition

, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465

Acoustic Simultaneous Localization and Mapping(a-SLAM) jointly localizes the trajectory of a microphone arrayinstalled on a moving platform, whilst estimating the acousticmap of surrounding sound sources, such as human speakers.Whilst traditional approaches for SLAM in the vision and opticalresearch literature rely on the assumption that the surroundingmap features are static, in the acoustic case the positions oftalkers are usually time-varying due to head rotations and bodymovements. This paper demonstrates that tracking of movingsources can be incorporated in a-SLAM by modelling the acousticmap as a Random Finite Set (RFS) of multiple sources andexplicitly imposing models of the source dynamics. The proposedapproach is verified and its performance evaluated for realisticsimulated data.

Journal article

Moore AH, Evers C, Naylor PA, 2016,

Direction of Arrival Estimation in the Spherical Harmonic Domain using Subspace Pseudo-Intensity Vectors

, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol: 25, Pages: 178-192, ISSN: 2329-9290

Direction of Arrival (DOA) estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented which operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses Pseudo-Intensity Vectors (PIVs) and works well in acoustic environments where only one sound source is active at any time. The second uses Subspace Pseudo-Intensity Vectors (SSPIVs) and is targeted at environments where multiple simultaneous sources and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state-of-the-art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated using speech recordings in real acoustic environments.

Conference paper

Xue W, Brookes M, Naylor PA, 2016,

Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization

, 24th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 718-722, ISSN: 2076-1465

Journal article

Warren RL, Ramamoorthy S, Ciganovic N, Zhang Y, Wilson T, Petrie T, Wang RK, Jacques SL, Reichenbach JDT, Nuttall AL, Fridberger Aet al., 2016,

Minimal basilar membrane motion in low-frequency hearing

, Proceedings of the National Academy of Sciences of the United States of America, Vol: 113, Pages: E4304-E4310, ISSN: 1091-6490

Low-frequency hearing is critically important for speech and music perception, but no mechanical measurements have previously been available from inner ears with intact low-frequency parts. These regions of the cochlea may function in ways different from the extensively studied high-frequency regions, where the sensory outer hair cells produce force that greatly increases the sound-evoked vibrations of the basilar membrane. We used laser interferometry in vitro and optical coherence tomography in vivo to study the low-frequency part of the guinea pig cochlea, and found that sound stimulation caused motion of a minimal portion of the basilar membrane. Outside the region of peak movement, an exponential decline in motion amplitude occurred across the basilar membrane. The moving region had different dependence on stimulus frequency than the vibrations measured near the mechanosensitive stereocilia. This behavior differs substantially from the behavior found in the extensively studied high-frequency regions of the cochlea.

Journal article

Reichenbach CS, Braiman C, Schiff ND, Hudspeth AJ, Reichenbach JDTet al., 2016,

The auditory-brainstem response to continuous, non repetitive speech is modulated by the speech envelope and reflects speech processing

, Frontiers in Computational Neuroscience, Vol: 10, ISSN: 1662-5188

The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the auditory brainstem response is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function.

Conference paper

Evers C, Moore A, Naylor P, 2016,

Towards Informative Path Planning for Acoustic SLAM

, DAGA 2016

Acoustic scene mapping is a challenging task as microphonearrays can often localize sound sources only interms of their directions. Spatial diversity can be exploitedconstructively to infer source-sensor range whenusing microphone arrays installed on moving platforms,such as robots. As the absolute location of a moving robotis often unknown in practice, Acoustic SimultaneousLocalization And Mapping (a-SLAM) is required in orderto localize the moving robot’s positions and jointlymap the sound sources. Using a novel a-SLAM approach,this paper investigates the impact of the choice of robotpaths on source mapping accuracy. Simulation results demonstratethat a-SLAM performance can be improved byinformatively planning robot paths.

Conference paper

Picinali L, Gerino A, Bernareggi C, Alabastro N, Mascetti Set al., 2015,

Towards Large Scale Evaluation of Novel Sonification Techniques for Non Visual Shape Exploration

, ACM SIGACCESS Conference on Computers & Accessibility, Publisher: ACM, Pages: 13-21

Conference paper

Hu M, sharma D, Doclo S, Brookes D, naylor Pet al., 2015,

Speaker change detection and speaker diarization using spatial information

, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Cite

Conference paper

Moore AH, Naylor PA, Skoglund J, 2014,

An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech

, European Signal Processing Conference, ISSN: 2219-5491

Cite

Journal article

Goodman DF, Benichoux V, Brette R, 2013,

Decoding neural responses to temporal cues for sound localization

, eLife, Vol: 2, ISSN: 2050-084X

The activity of sensory neural populations carries information about the environment. This may be extracted from neural activity using different strategies. In the auditory brainstem, a recent theory proposes that sound location in the horizontal plane is decoded from the relative summed activity of two populations in each hemisphere, whereas earlier theories hypothesized that the location was decoded from the identity of the most active cells. We tested the performance of various decoders of neural responses in increasingly complex acoustical situations, including spectrum variations, noise, and sound diffraction. We demonstrate that there is insufficient information in the pooled activity of each hemisphere to estimate sound direction in a reliable way consistent with behavior, whereas robust estimates can be obtained from neural activity by taking into account the heterogeneous tuning of cells. These estimates can still be obtained when only contralateral neural responses are used, consistently with unilateral lesion studies. DOI: http://dx.doi.org/10.7554/eLife.01312.001.

Conference paper

Goodman DFM, Brette R, 2010,

Learning to localise sounds with spiking neural networks

To localise the source of a sound, we use location-specific properties of the signals received at the two ears caused by the asymmetric filtering of the original sound by our head and pinnae, the head-related transfer functions (HRTFs). These HRTFs change throughout an organism's lifetime, during development for example, and so the required neural circuitry cannot be entirely hardwired. Since HRTFs are not directly accessible from perceptual experience, they can only be inferred from filtered sounds. We present a spiking neural network model of sound localisation based on extracting location-specific synchrony patterns, and a simple supervised algorithm to learn the mapping between synchrony patterns and locations from a set of example sounds, with no previous knowledge of HRTFs. After learning, our model was able to accurately localise new sounds in both azimuth and elevation, including the difficult task of distinguishing sounds coming from the front and back.

Abstract
Cite
Citations: 8

Journal article

Goodman DF, Brette R, 2010,

Spike-timing-based computation in sound localization

, PLOS Computational Biology, Vol: 6, ISSN: 1553-734X

Spike timing is precise in the auditory system and it has been argued that it conveys information about auditory stimuli, in particular about the location of a sound source. However, beyond simple time differences, the way in which neurons might extract this information is unclear and the potential computational advantages are unknown. The computational difficulty of this task for an animal is to locate the source of an unexpected sound from two monaural signals that are highly dependent on the unknown source signal. In neuron models consisting of spectro-temporal filtering and spiking nonlinearity, we found that the binaural structure induced by spatialized sounds is mapped to synchrony patterns that depend on source location rather than on source signal. Location-specific synchrony patterns would then result in the activation of location-specific assemblies of postsynaptic neurons. We designed a spiking neuron model which exploited this principle to locate a variety of sound sources in a virtual acoustic environment using measured human head-related transfer functions. The model was able to accurately estimate the location of previously unknown sounds in both azimuth and elevation (including front/back discrimination) in a known acoustic environment. We found that multiple representations of different acoustic environments could coexist as sets of overlapping neural assemblies which could be associated with spatial locations by Hebbian learning. The model demonstrates the computational relevance of relative spike timing to extract spatial information about sources independently of the source signal.

Imperial College London

Latest News

Natural and Machine Hearing

Localization of Moving Microphone Arrays from Moving Sound Sources for Robot Audition

Direction of Arrival Estimation in the Spherical Harmonic Domain using Subspace Pseudo-Intensity Vectors

Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization

Minimal basilar membrane motion in low-frequency hearing

The auditory-brainstem response to continuous, non repetitive speech is modulated by the speech envelope and reflects speech processing

Towards Informative Path Planning for Acoustic SLAM

Towards Large Scale Evaluation of Novel Sonification Techniques for Non Visual Shape Exploration

Speaker change detection and speaker diarization using spatial information

An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech

Decoding neural responses to temporal cues for sound localization

Learning to localise sounds with spiking neural networks

Spike-timing-based computation in sound localization

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization

Towards Informative Path Planning for Acoustic SLAM

An Analysis of the Effect of Larynx-Synchronous Averaging on Dereverberation of Voiced Speech

Learning to localise sounds with spiking neural networks