Imperial College London

Dr Patrick A. Naylor

Faculty of EngineeringDepartment of Electrical and Electronic Engineering

Professor of Speech & Acoustic Signal Processing



+44 (0)20 7594 6235p.naylor Website




803Electrical EngineeringSouth Kensington Campus





Publication Type

331 results found

Dorfan Y, Evers C, Gannot S, Naylor Pet al., 2016, Speaker Localization with Moving Microphone Arrays, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465

Speaker localization algorithms often assume staticlocation for all sensors. This assumption simplifies the modelsused, since all acoustic transfer functions are linear time invariant.In many applications this assumption is not valid. Inthis paper we address the localization challenge with movingmicrophone arrays. We propose two algorithms to find thespeaker position. The first approach is a batch algorithm basedon the maximum likelihood criterion, optimized via expectationmaximizationiterations. The second approach is a particle filterfor sequential Bayesian estimation. The performance of bothapproaches is evaluated and compared for simulated reverberantaudio data from a microphone array with two sensors.

Conference paper

Xue W, Brookes DM, Naylor PA, 2016, Under-modelled blind system identification for time delay estimation in reverberant environments, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

In multichannel systems, acoustic time delay estimation (TDE) is a challenging problem in reverberant environments. Although blind system identification (BSI) based methods have been proposed which utilize a realistic signal model for the room impulse response (RIR), their TDE performance depends strongly on that of the BSI, which is often inaccurate in practice when the identified responses are under-modelled. In this paper, we propose a new under-modelled BSI based method for TDE in reverberant environments. An under-modelled BSI algorithm is derived, which is based on maximizing the cross-correlation of the cross-filtered signals rather than minimizing the cross-relation error, and also exploits the sparsity of the early part of the RIR. For TDE, this new criterion can be viewed as a generalization of conventional cross-correlation-based TDE methods by considering a more realistic model for the early RIR. Depending on the microphone spacing, only a short early part of each RIR is identified, and the time delays are estimated based on the peak locations in the identified early RIRs. Experiments in different reverberant environments with speech source signals demonstrate the effectiveness of the proposed method.

Conference paper

Moore AH, Evers C, Naylor PA, 2016, Direction of Arrival Estimation in the Spherical Harmonic Domain using Subspace Pseudo-Intensity Vectors, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol: 25, Pages: 178-192, ISSN: 2329-9290

Direction of Arrival (DOA) estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented which operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses Pseudo-Intensity Vectors (PIVs) and works well in acoustic environments where only one sound source is active at any time. The second uses Subspace Pseudo-Intensity Vectors (SSPIVs) and is targeted at environments where multiple simultaneous sources and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state-of-the-art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated using speech recordings in real acoustic environments.

Journal article

Moore AH, Naylor P, 2016, Linear prediction based dereverberation for spherical microphone arrays, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

Dereverberation is an important preprocessing step in manyspeech systems, both for human and machine listening. Inmany situations, including robot audition, the sound sourcesof interest can be incident from any direction. In such circumstances,a spherical microphone array allows direction of arrivalestimation which is free of spatial aliasing and directionindependentbeam patterns can be formed. This contributionformulates the Weighted Prediction Error algorithm in thespherical harmonic domain and compares the performance toa space domain implementation. Simulation results demonstratethat performing dereverberation in the spherical harmonicdomain allows many more microphones to be usedwithout increasing the computational cost. The benefit ofusing many microphones is particularly apparent at low signalto noise ratios, where for the conditions tested up to 71%improvement in speech-to-reverberation modulation ratio wasachieved.

Conference paper

Naylor PA, Zahedi A, Jensen S, Bech Set al., 2016, Source Coding in Networks with Covariance Distortion Constraints, IEEE Transactions on Signal Processing, Vol: 64, Pages: 5943-5958, ISSN: 1053-587X

We consider a source coding problem with a networkscenario in mind, and formulate it as a remote vectorGaussian Wyner-Ziv problem under covariance matrix distortions.We define a notion of minimum for two positive-definitematrices based on which we derive an explicit formula for therate-distortion function (RDF). We then study the special casesand applications of this result. We show that two well-studiedsource coding problems, i.e. remote vector Gaussian Wyner-Ziv problems with mean-squared error and mutual informationconstraints are in fact special cases of our results. Finally,we apply our results to a joint source coding and denoisingproblem. We consider a network with a centralized topology anda given weighted sum-rate constraint, where the received signalsat the center are to be fused to maximize the output SNR whileenforcing no linear distortion. We show that one can design thedistortion matrices at the nodes in order to maximize the outputSNR at the fusion center. We thereby bridge between denoisingand source coding within this setup.

Journal article

Xue W, Brookes M, Naylor PA, 2016, Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization, 24th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 718-722, ISSN: 2076-1465

Conference paper

Jarrett DP, Habets EAP, Naylor PA, 2016, Theory and Applications of Spherical Microphone Array Processing, Publisher: Springer, ISBN: 9783319422114

This book presents the signal processing algorithms that have been developed to process the signals acquired by a spherical microphone array.


Eaton DJ, Gaubitch ND, Moore AH, Naylor PAet al., 2016, Estimation of room acoustic parameters: the ACE challenge, IEEE Transactions on Audio Speech and Language Processing, Vol: 24, Pages: 1681-1693, ISSN: 2329-9290

Reverberation Time (T60) and Direct-to-Reverberant Ratio (DRR) are important parameters which together can characterize sound captured by microphones in non-anechoic rooms. These parameters are important in speech processing applications such as speech recognition and dereverberation. The values of T60 and DRR can be estimated directly from the Acoustic Impulse Response (AIR) of the room. In practice, the AIR isnot normally available, in which case these parameters must be estimated blindly from the observed speech in the microphone signal. The Acoustic Characterization of Environments (ACE) Challenge aimed to determine the state-of-the-art in blind acoustic parameter estimation and also to stimulate research in this area. A summary of the ACE Challenge, and the corpusused in the challenge is presented together with an analysis of the results. Existing algorithms were submitted alongside novel contributions, the comparative results for which are presented in this paper. The challenge showed that T60 estimation is a mature field where analytical approaches dominate whilst DRR estimation is a less mature field where machine learning approaches are currently more successful.

Journal article

Eaton DJ, Moore AH, Naylor PA, Skoglund Jet al., 2016, Reverberation estimator, US20160118038 A1

Provided are methods and systems for generating Direct-to-Reverberant Ratio (DRR) estimates. The methods and systems use a null-steered beamformer to produce accurate DRR estimates across a variety of room sizes, reverberation times, and source-receiver distances. The DRR estimation algorithm uses spatial selectivity to separate direct and reverberant energy and account for noise separately. The formulation considers the response of the beamformer to reverberant sound and the effect of noise. The DRR estimation algorithm is more robust to background noise than existing approaches, and is applicable where a signal is recorded with two or more microphones, such as with mobile communications devices, laptop computers, and the like.


Sharma D, Naylor PA, Wang Y, Brookes DMet al., 2016, A Data-Driven Non-intrusive Measure of Speech Quality and Intelligibility, Speech Communication, Vol: 80, Pages: 84-94, ISSN: 0167-6393

Speech signals are often affected by additive noiseand distortion which can degrade the perceived quality andintelligibility of the signal. We present a new measure, NISA, forestimating the quality and intelligibility of speech degraded byadditive noise and distortions associated with telecommunicationsnetworks, based on a data driven framework of feature extractionand tree based regression. The new measure is non-intrusive,operating on the degraded signal alone without the need for areference signal. This makes the measure applicable to practicalspeech processing applications operating in the single-endedmode. The new measure has been evaluated against the intrusivemeasures PESQ and STOI. The results indicate that the accuracyof the new non-intrusive method is around 90% of the accuracy ofthe intrusive measures, depending on the test scenario. The NISAmeasure therefore provides non-intrusive (single-ended) PESQand STOI estimates with high accuracy.

Journal article

Javed HA, Moore AH, Naylor PA, 2016, Spherical microphone array acoustic rake receivers, ICASSP, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 111-115, ISSN: 0736-7791

Several signal independent acoustic rake receivers are proposed for speech dereverberation using spherical microphone arrays. The proposed rake designs take advantage of multipaths, by separately capturing and combining early reflections with the direct path. We investigate several approaches in combining reflections with the direct path source signal, including the development of beam patterns that point nulls at all preceding reflections. The proposed designs are tested in experimental simulations and their dereverberation performances evaluated using objective measures. For the tested configuration, the proposed designs achieve higher levels of dereverberation compared to conventional signal independent beamforming systems; achieving up to 3.6 dB improvement in the direct-to-reverberant ratio over the plane-wave decomposition beamformer.

Conference paper

Evers C, Moore AH, Naylor PA, 2016, Acoustic simultaneous localization and mapping (A-SLAM) of a moving microphone array and its surrounding speakers, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 6-10, ISSN: 1520-6149

Acoustic scene mapping creates a representation of positions of audio sources such as talkers within the surrounding environment of a microphone array. By allowing the array to move, the acoustic scene can be explored in order to improve the map. Furthermore, the spatial diversity of the kinematic array allows for estimation of the source-sensor distance in scenarios where source directions of arrival are measured. As sound source localization is performed relative to the array position, mapping of acoustic sources requires knowledge of the absolute position of the microphone array in the room. If the array is moving, its absolute position is unknown in practice. Hence, Simultaneous Localization and Mapping (SLAM) is required in order to localize the microphone array position and map the surrounding sound sources. In realistic environments, microphone arrays receive a convolutive mixture of direct-path speech signals, noise and reflections due to reverberation. A key challenge of Acoustic SLAM (a-SLAM) is robustness against reverberant clutter measurements and missing source detections. This paper proposes a novel bearing-only a-SLAM approach using a Single-Cluster Probability Hypothesis Density filter. Results demonstrate convergence to accurate estimates of the array trajectory and source positions.

Conference paper

Neeld T, Eaton J, Naylor PA, Shipworth Det al., 2016, A novel method of determining events in combination gas boilers: Assessing the feasibility of a passive acoustic sensor, Building and Environment, Vol: 100, Pages: 1-9, ISSN: 0360-1323

To assess the impact of interventions designed to reduce residential space heating demand, investigators must be armed with field-trial applicable techniques that accurately measure space heating energy use. This study assesses the feasibility of using a passive acoustic sensor to detect gas consumption events in domestic combination gas-fired boilers (C-GFBs). The investigation has shown, for the C-GFB investigated, the following events are discernible using a passive acoustic sensor: demand type (hot water or central heating); boiler ignition time; and pre-mix fan motor speed. A detection algorithm was developed to automatically identify demand type and burner ignition time with accuracies of 100% and 97% respectfully. Demand type was determined by training a naive Bayes classifier on 20 features of the acoustic profile at the start of a demand event. Burner ignition was determined by detecting low frequency (5–10 Hz) pressure pulsations produced during ignition. The acoustic signatures of the pre-mix fan and circulation-pump were identified manually. Additional work is required to detect burner duration, deal with detection in the presence of increased noise and expand the range of boilers investigated. There are considerable implications resulting from the widespread use of such techniques on improving understanding of space heating demand.

Journal article

Doire CSJ, Brookes DM, Naylor PA, De Sena E, van Waterschoot T, Jensen SHJet al., Acoustic Environment Control: Implementation of a Reverberation Enhancement System, AES 60th International Conference: DREAMS (Dereverberation and Reverberation of Audio, Music, and Speech)

Reverberation enhancement systems allow the active control of the acoustic environment. They are subject to instability issues due to acoustic feedback, and are often installed permanently in large halls, sometimes at great cost. In this paper, we explore the possibility of implementing a cost-effective reverberation enhancement system to control the acoustics of typical rooms using a combination of spatial filtering, automatic calibration, adaptive notch filters, howling detection and manual adjustments. The effectiveness of the system is then tested inside a small soundproof booth.

Conference paper

Parada PP, Sharma D, Lainez J, Barreda D, van Waterschoot T, Naylor PAet al., 2016, A single-channel non-intrusive C50 estimator correlated with speech recognition performance, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 24, Pages: 719-732, ISSN: 2329-9304

Journal article

Evers C, Moore A, Naylor P, Towards Informative Path Planning for Acoustic SLAM, DAGA 2016

Acoustic scene mapping is a challenging task as microphonearrays can often localize sound sources only interms of their directions. Spatial diversity can be exploitedconstructively to infer source-sensor range whenusing microphone arrays installed on moving platforms,such as robots. As the absolute location of a moving robotis often unknown in practice, Acoustic SimultaneousLocalization And Mapping (a-SLAM) is required in orderto localize the moving robot’s positions and jointlymap the sound sources. Using a novel a-SLAM approach,this paper investigates the impact of the choice of robotpaths on source mapping accuracy. Simulation results demonstratethat a-SLAM performance can be improved byinformatively planning robot paths.

Conference paper

Cauchi B, Santos JF, Siedenburg K, Falk TH, Naylor PA, Doclo S, Goetze Set al., 2016, Predicting the quality of processed speech by combining modulation-based features and model trees, Pages: 180-184

© 2016 VDE VERLAG GMBH. Many signal processing methods have been proposed to improve the quality of speech recorded in the presence of noise and reverberation. The evaluation of these methods either requires the use of perceptual measures, i.e. listening tests, or instrumental measures. Perceptual measures are typically more reliable but are quite costly and time-consuming. On the other hand, instrumental measures may correlate poorly with the perceived speech quality. In this paper we propose to train an instrumental measure, combining modulation-based features and model trees, on the basis of perceptual scores obtained on a small corpus of speech data that has been processed by a combination of beamforming and spectral postfiltering. For evaluation purposes the resulting measure is then applied to a larger corpus. Results show that the use of model trees to train the predicting function of an instrumental measure increases its correlation with perceptual scores.

Conference paper

Cauchi B, Javed H, Gerkmann T, Doclo S, Goetze S, Naylor Pet al., 2016, PERCEPTUAL AND INSTRUMENTAL EVALUATION OF THE PERCEIVED LEVEL OF REVERBERATION, IEEE International Conference on Acoustics, Speech, and Signal Processing, Publisher: IEEE, Pages: 629-633, ISSN: 1520-6149

Conference paper

Javed HA, Moore AH, Naylor PA, 2016, SPHERICAL HARMONIC RAKE RECEIVERS FOR DEREVERBERATION, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

Conference paper

Hafezi S, Moore AH, Naylor PA, 2016, 3D ACOUSTIC SOURCE LOCALIZATION IN THE SPHERICAL HARMONIC DOMAIN BASED ON OPTIMIZED GRID SEARCH, IEEE International Conference on Acoustics, Speech, and Signal Processing, Publisher: IEEE, Pages: 415-419, ISSN: 1520-6149

Conference paper

Zhang W, Naylor PA, He Z, Zhang Yet al., 2016, ON THE EVALUATION OF MULTICHANNEL BLIND SYSTEM IDENTIFICATION FROM THE VIEWPOINT OF SYSTEM EQUALIZATION, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

Conference paper


Conference paper

Zahedi A, Ostergaard J, Jensen SH, Naylor P, Bech Set al., 2016, On Perceptual Audio Compression with Side Information at the Decoder, Data Compression Conference (DCC), Publisher: IEEE, Pages: 456-465, ISSN: 1068-0314

Conference paper

Eaton J, Gaubitch ND, Moore AH, Naylor PAet al., 2015, Proceedings of the ACE Challenge Workshop, a satellite event of IEEE-WASPAA

Several established parameters and metrics have been used to characterize the acoustics of a room. The most important are the Direct-To-Reverberant Ratio (DRR), the Reverberation Time (T60) and the reflection coefficient. The acoustic characteristics of a room based on such parameters can be used to predict the quality and intelligibility of speech signals in that room. Recently, several important methods in speech enhancement and speech recognition have been developed that show an increase in performance compared to the predecessors but do require knowledge of one or more fundamental acoustical parameters such as the T60. Traditionally, these parameters have been estimated using carefully measured Acoustic Impulse Responses (AIRs). However, in most applications it is not practical or even possible to measure the acoustic impulse response. Consequently, there is increasing research activity in the estimation of such parameters directly from speech and audio signals. The aim of this challenge was to evaluate state-of-the-art algorithms for blind acoustic parameter estimation from speech and to promote the emerging area of research in this field. Participants evaluated their algorithms for T60 and DRR estimation against the ’ground truth’ values provided with the data-sets and presented the results in a paper describing the method used.


Eaton J, Gaubitch ND, Moore AH, Naylor PAet al., 2015, The ACE Challenge - corpus description and performance evaluation, WASPAA, Publisher: IEEE

Knowledge of the Direct-to-Reverberant Ratio (DRR) and Reverberation Time (T60) can be used to better perform speech and audio processing such as dereverberation. Established methods compute these parameters from measured Acoustic Impulse Responses (AIRs). However, in many practical situations the AIR is not available and the parameters must be estimated non-intrusively directly from noisy speech or audio signals. The Acoustic Characterization of Environments (ACE) Challenge is a competition to identify the most promising non-intrusive DRR and T60 estimation methods using real noisy reverberant speech. We describe the ACE corpus comprising multi-channel AIRs, and multi-channel noise including ambient, fan and babble noise recorded in the same environment as the measured AIRs, along with the corresponding DRR and T60 measurements. The evaluation methodology is discussed and comparative results are shown.

Conference paper

Eaton J, Naylor PA, 2015, Direct-to-Reverberant ratio estimation on the ACE corpus using a Two-channel beamformer, arXiv, ACE Challenge Workshop, a satellite event of IEEE-WASPAA, Publisher: arXiv

Direct-to-Reverberant Ratio (DRR) is an important measure for characterizing the properties of a room. The recently proposed DRR Estimation using a Null-Steered Beamformer (DENBE) algorithm was originally tested on simulated data where noise was artificially added to the speech after convolution with impulse responses simulated using the image-source method. This paper evaluates the performance of this algorithm on speech convolved with measured impulse responses and noise using the Acoustic Characterization of Environments (ACE) Evaluation corpus. The fullband DRR estimation performance of the DENBE algorithm exceeds that of the baselines in all Signal-to-Noise Ratios (SNRs) and noise types. In addition, estimation of the DRR in one third-octave ISO frequency bands is demonstrated.

Conference paper

Eaton J, Naylor PA, 2015, Reverberation time estimation on the ACE corpus using the SDD method, arXiv, ACE Challenge Workshop, a satellite event of IEEE-WASPAA, Publisher: arXiv

Reverberation Time ($T_60$) is an important measure for characterizing the properties of a room. The author’s $T_60$ estimation algorithm was previously tested on simulated data where the noise is artificially added to the speech after convolution with a impulse responses simulated using the image method. We test the algorithm on speech convolved with real recorded impulse responses and noise from the same rooms from the Acoustic Characterization of Environments (ACE) corpus and achieve results comparable results to those using simulated data.

Conference paper

Moore AH, Evers C, Naylor PA, Alon DL, Rafaely Bet al., 2015, Direction of arrival estimation using pseudo-intensity vectors with direct-path dominance test, European Signal Processing Conference, Publisher: IEEE, Pages: 2296-2300, ISSN: 2219-5491

The accuracy of direction of arrival estimation tends to degrade under reverberant conditions due to the presence of reflected signal components which are correlated with the direct path. The recently proposed direct-path dominance test provides a means of identifying time-frequency regions in which a single signal path is dominant. By analysing only these regions it was shown that the accuracy of the FS-MUSIC algorithm could be significantly improved. However, for real-time implementation a less computationally demanding localisation algorithm would be preferable. In the present contribution we investigate the direct-path dominance test as a preprocessing step to pseudo-intensity vector-based localisation. A novel formulation of the pseudo-intensity vector is proposed which further exploits the direct path dominance test and leads to improved localisation performance.

Conference paper

Doire C, Brookes D, Naylor P, Jensen SHet al., 2015, Data-Driven Statistical Modelling of Room Impulse Responses in the Power Domain, European Signal Processing Conference (EUSIPCO), Publisher: IEEE

Having an accurate statistical model of room impulse responses with a minimum number of parameters is of crucial importance in applications such as dereverberation. In this paper, by taking into account the behaviour of the early reflections, we extend the widely-used statistical model proposed by Polack. The squared room impulse response is modelled in each frequency band as the realisation of a stochastic process weighted by the sum of two exponential decays. Room-independent values for the new parameters involved are obtained through analysis of several room impulse response databases, and validation of the model in the likelihood sense is performed.

Conference paper

Hu M, Doclo S, Sharma D, Brookes D, Naylor Pet al., 2015, Noise Robust Blind System Identification Algorithms Based On A Rayleigh Quotient Cost Function, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 2476-2480

An important prerequisite for acoustic multi-channel equalization for speech dereverberation involves the identification of the acoustic channels between the source and the microphones. Blind System Identification (BSI) algorithms based on cross-relation error minimization are known to mis-converge in the presence of noise. Although algorithms have been proposed in the literature to improve robustness to noise, the estimated room impulse responses are usually constrained to have a flat magnitude spectrum. In this paper, noise robust algorithms based on a Rayleigh quotient cost function are proposed. Unlike the traditional algorithms, the estimated impulse responses are not always forced to have unit norm. Experimental results using simulated room impulse responses and several SNRs show that one of the proposed algorithms outperforms competing algorithms in terms of normalized projection misalignment.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: id=00004259&limit=30&person=true&page=3&respub-action=search.html