Publications

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017,

Spherical Array Acoustic Impulse Response Simulation

, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 39-64, ISBN: 978-3-319-42209-1

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017,

Parametric Array Processing

, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 141-150, ISBN: 978-3-319-42209-1

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017,

Spatial Sampling and Signal Transformation

, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 23-37, ISBN: 978-3-319-42209-1

Book

Jarrett DP, Habets EAP, Naylor PA, 2017,

Theory and Applications of Spherical Microphone Array Processing Introduction

, Publisher: SPRINGER-VERLAG BERLIN, ISBN: 978-3-319-42209-1

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017,

Acoustic Parameter Estimation

, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 65-92, ISBN: 978-3-319-42209-1

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017,

Informed Array Processing

, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 151-184, ISBN: 978-3-319-42209-1

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017,

Signal-Independent Array Processing

, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 93-112, ISBN: 978-3-319-42209-1

Author Web Link
Cite
Citations: 2

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017,

Theoretical Preliminaries of Acoustics

, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 11-22, ISBN: 978-3-319-42209-1

Author Web Link
Cite
Citations: 1

Journal article

Moore AH, Peso P, Naylor PA, 2016,

Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures

, Computer Speech and Language, Vol: 46, Pages: 574-584, ISSN: 1095-8363

Automatic speech recognition in everyday environments must be robust to significant levels of reverberation andnoise. One strategy to achieve such robustness is multi-microphone speech enhancement. In this study, we presentresults of an evaluation of different speech enhancement pipelines using a state-of-the-artASRsystem for a widerange of reverberation and noise conditions. The evaluation exploits the recently released ACE Challenge databasewhich includes measured multichannel acoustic impulse responses from 7 different rooms with reverberation timesranging from 0.33 s to 1.34 s. The reverberant speech is mixed with ambient, fan and babble noise recordings madewith the same microphone setups in each of the rooms. In the first experiment performance of theASRwithoutspeech processing is evaluated. Results clearly indicate the deleterious effect of both noise and reverberation. In thesecond experiment, different speech enhancement pipelines are evaluated with relative word error rate reductions ofup to 82%. Finally, the ability of selected instrumental metrics to predictASRperformance improvement is assessed.The best performing metric, Short-Time Objective Intelligibility Measure, is shown to have a Pearson correlationcoefficient of 0.79, suggesting that it is a useful predictor of algorithm performance in these tests.

Conference paper

Dorfan Y, Evers C, Gannot S, Naylor Pet al., 2016,

Speaker Localization with Moving Microphone Arrays

, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465

Speaker localization algorithms often assume staticlocation for all sensors. This assumption simplifies the modelsused, since all acoustic transfer functions are linear time invariant.In many applications this assumption is not valid. Inthis paper we address the localization challenge with movingmicrophone arrays. We propose two algorithms to find thespeaker position. The first approach is a batch algorithm basedon the maximum likelihood criterion, optimized via expectationmaximizationiterations. The second approach is a particle filterfor sequential Bayesian estimation. The performance of bothapproaches is evaluated and compared for simulated reverberantaudio data from a microphone array with two sensors.

Conference paper

Hafezi S, Moore AH, Naylor PA, 2016,

Multiple source localization in the spherical harmonic domain using augmented intensity vectors based on grid search

, European Signal Processing Conference, Publisher: IEEE, ISSN: 2219-5491

Multiple source localization is an important task in acousticsignal processing with applications including dereverberation,source separation, source tracking and environmentmapping. When using spherical microphone arrays, it hasbeen previously shown that Pseudo-intensity Vectors (PIV),and Augmented Intensity Vectors (AIV), are an effective approachfor direction of arrival estimation of a sound source.In this paper, we evaluate AIV-based localization in acousticscenarios involving multiple sound sources. Simulations areconducted where the number of sources, their angular separationand the reverberation time of the room are varied. Theresults indicate that AIV outperforms PIV and Steered ResponsePower (SRP) with an average accuracy between 5 and10 degrees for sources with angular separation of 30 degreesor more. AIV also shows better robustness to reverberationtime than PIV and SRP.

Conference paper

Moore AH, Evers C, Naylor PA, 2016,

2D direction of arrival estimation of multiple moving sources using a spherical microphone array

, European Signal Processing Conference, Publisher: IEEE, ISSN: 2219-5491

Direction of arrival estimation using a spherical microphonearray is an important and growing research area. One promisingalgorithm is the recently proposed Subspace PseudoIntensityVector method. In this contribution the SubspacePseudo-Intensity Vector method is combined with a state-ofthe-artmethod for robustly estimating the centres of mass in a2D histogram based on matching pursuits. The performanceof the improved Subspace Pseudo-Intensity Vector method isevaluated in the context of localising multiple moving sourceswhere it is shown to outperform competing methods in termsof clutter rate and the number of missed detections whilstremaining comparable in terms of localisation accuracy.

Conference paper

Evers C, Moore A, Naylor P, 2016,

Localization of Moving Microphone Arrays from Moving Sound Sources for Robot Audition

, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465

Acoustic Simultaneous Localization and Mapping(a-SLAM) jointly localizes the trajectory of a microphone arrayinstalled on a moving platform, whilst estimating the acousticmap of surrounding sound sources, such as human speakers.Whilst traditional approaches for SLAM in the vision and opticalresearch literature rely on the assumption that the surroundingmap features are static, in the acoustic case the positions oftalkers are usually time-varying due to head rotations and bodymovements. This paper demonstrates that tracking of movingsources can be incorporated in a-SLAM by modelling the acousticmap as a Random Finite Set (RFS) of multiple sources andexplicitly imposing models of the source dynamics. The proposedapproach is verified and its performance evaluated for realisticsimulated data.

Conference paper

Xue W, Brookes DM, Naylor PA, 2016,

Under-modelled blind system identification for time delay estimation in reverberant environments

, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

In multichannel systems, acoustic time delay estimation (TDE) is a challenging problem in reverberant environments. Although blind system identification (BSI) based methods have been proposed which utilize a realistic signal model for the room impulse response (RIR), their TDE performance depends strongly on that of the BSI, which is often inaccurate in practice when the identified responses are under-modelled. In this paper, we propose a new under-modelled BSI based method for TDE in reverberant environments. An under-modelled BSI algorithm is derived, which is based on maximizing the cross-correlation of the cross-filtered signals rather than minimizing the cross-relation error, and also exploits the sparsity of the early part of the RIR. For TDE, this new criterion can be viewed as a generalization of conventional cross-correlation-based TDE methods by considering a more realistic model for the early RIR. Depending on the microphone spacing, only a short early part of each RIR is identified, and the time delays are estimated based on the peak locations in the identified early RIRs. Experiments in different reverberant environments with speech source signals demonstrate the effectiveness of the proposed method.

Journal article

Moore AH, Evers C, Naylor PA, 2016,

Direction of Arrival Estimation in the Spherical Harmonic Domain using Subspace Pseudo-Intensity Vectors

, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol: 25, Pages: 178-192, ISSN: 2329-9290

Direction of Arrival (DOA) estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented which operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses Pseudo-Intensity Vectors (PIVs) and works well in acoustic environments where only one sound source is active at any time. The second uses Subspace Pseudo-Intensity Vectors (SSPIVs) and is targeted at environments where multiple simultaneous sources and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state-of-the-art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated using speech recordings in real acoustic environments.

Conference paper

Moore AH, Naylor P, 2016,

Linear prediction based dereverberation for spherical microphone arrays

, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

Dereverberation is an important preprocessing step in manyspeech systems, both for human and machine listening. Inmany situations, including robot audition, the sound sourcesof interest can be incident from any direction. In such circumstances,a spherical microphone array allows direction of arrivalestimation which is free of spatial aliasing and directionindependentbeam patterns can be formed. This contributionformulates the Weighted Prediction Error algorithm in thespherical harmonic domain and compares the performance toa space domain implementation. Simulation results demonstratethat performing dereverberation in the spherical harmonicdomain allows many more microphones to be usedwithout increasing the computational cost. The benefit ofusing many microphones is particularly apparent at low signalto noise ratios, where for the conditions tested up to 71%improvement in speech-to-reverberation modulation ratio wasachieved.

Journal article

Naylor PA, Zahedi A, Jensen S, Bech Set al., 2016,

Source Coding in Networks with Covariance Distortion Constraints

, IEEE Transactions on Signal Processing, Vol: 64, Pages: 5943-5958, ISSN: 1053-587X

We consider a source coding problem with a networkscenario in mind, and formulate it as a remote vectorGaussian Wyner-Ziv problem under covariance matrix distortions.We define a notion of minimum for two positive-definitematrices based on which we derive an explicit formula for therate-distortion function (RDF). We then study the special casesand applications of this result. We show that two well-studiedsource coding problems, i.e. remote vector Gaussian Wyner-Ziv problems with mean-squared error and mutual informationconstraints are in fact special cases of our results. Finally,we apply our results to a joint source coding and denoisingproblem. We consider a network with a centralized topology anda given weighted sum-rate constraint, where the received signalsat the center are to be fused to maximize the output SNR whileenforcing no linear distortion. We show that one can design thedistortion matrices at the nodes in order to maximize the outputSNR at the fusion center. We thereby bridge between denoisingand source coding within this setup.

Conference paper

Xue W, Brookes M, Naylor PA, 2016,

Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization

, 24th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 718-722, ISSN: 2076-1465

Journal article

Eaton DJ, Gaubitch ND, Moore AH, Naylor PAet al., 2016,

Estimation of room acoustic parameters: the ACE challenge

, IEEE Transactions on Audio Speech and Language Processing, Vol: 24, Pages: 1681-1693, ISSN: 2329-9290

Reverberation Time (T60) and Direct-to-Reverberant Ratio (DRR) are important parameters which together can characterize sound captured by microphones in non-anechoic rooms. These parameters are important in speech processing applications such as speech recognition and dereverberation. The values of T60 and DRR can be estimated directly from the Acoustic Impulse Response (AIR) of the room. In practice, the AIR isnot normally available, in which case these parameters must be estimated blindly from the observed speech in the microphone signal. The Acoustic Characterization of Environments (ACE) Challenge aimed to determine the state-of-the-art in blind acoustic parameter estimation and also to stimulate research in this area. A summary of the ACE Challenge, and the corpusused in the challenge is presented together with an analysis of the results. Existing algorithms were submitted alongside novel contributions, the comparative results for which are presented in this paper. The challenge showed that T60 estimation is a mature field where analytical approaches dominate whilst DRR estimation is a less mature field where machine learning approaches are currently more successful.

Patent

Eaton DJ, Moore AH, Naylor PA, Skoglund Jet al., 2016,

Reverberation estimator

, US20160118038 A1

Provided are methods and systems for generating Direct-to-Reverberant Ratio (DRR) estimates. The methods and systems use a null-steered beamformer to produce accurate DRR estimates across a variety of room sizes, reverberation times, and source-receiver distances. The DRR estimation algorithm uses spatial selectivity to separate direct and reverberant energy and account for noise separately. The formulation considers the response of the beamformer to reverberant sound and the effect of noise. The DRR estimation algorithm is more robust to background noise than existing approaches, and is applicable where a signal is recorded with two or more microphones, such as with mobile communications devices, laptop computers, and the like.

Imperial College London

Latest News

Speech and Audio Processing

Spherical Array Acoustic Impulse Response Simulation

Parametric Array Processing

Spatial Sampling and Signal Transformation

Theory and Applications of Spherical Microphone Array Processing Introduction

Acoustic Parameter Estimation

Informed Array Processing

Signal-Independent Array Processing

Theoretical Preliminaries of Acoustics

Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures

Speaker Localization with Moving Microphone Arrays

Multiple source localization in the spherical harmonic domain using augmented intensity vectors based on grid search

2D direction of arrival estimation of multiple moving sources using a spherical microphone array

Localization of Moving Microphone Arrays from Moving Sound Sources for Robot Audition

Under-modelled blind system identification for time delay estimation in reverberant environments

Direction of Arrival Estimation in the Spherical Harmonic Domain using Subspace Pseudo-Intensity Vectors

Linear prediction based dereverberation for spherical microphone arrays

Source Coding in Networks with Covariance Distortion Constraints

Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization

Estimation of room acoustic parameters: the ACE challenge

Reverberation estimator

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization

Reverberation estimator