Imperial College London

Dr Patrick A. Naylor

Faculty of EngineeringDepartment of Electrical and Electronic Engineering

Professor of Speech & Acoustic Signal Processing



+44 (0)20 7594 6235p.naylor Website




803Electrical EngineeringSouth Kensington Campus





Publication Type

328 results found

Naylor PA, Zahedi A, Jensen S, Bech Set al., 2016, Source Coding in Networks with Covariance Distortion Constraints, IEEE Transactions on Signal Processing, Vol: 64, Pages: 5943-5958, ISSN: 1053-587X

We consider a source coding problem with a networkscenario in mind, and formulate it as a remote vectorGaussian Wyner-Ziv problem under covariance matrix distortions.We define a notion of minimum for two positive-definitematrices based on which we derive an explicit formula for therate-distortion function (RDF). We then study the special casesand applications of this result. We show that two well-studiedsource coding problems, i.e. remote vector Gaussian Wyner-Ziv problems with mean-squared error and mutual informationconstraints are in fact special cases of our results. Finally,we apply our results to a joint source coding and denoisingproblem. We consider a network with a centralized topology anda given weighted sum-rate constraint, where the received signalsat the center are to be fused to maximize the output SNR whileenforcing no linear distortion. We show that one can design thedistortion matrices at the nodes in order to maximize the outputSNR at the fusion center. We thereby bridge between denoisingand source coding within this setup.

Journal article

Xue W, Brookes M, Naylor PA, 2016, Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization, 24th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 718-722, ISSN: 2076-1465

Conference paper

Jarrett DP, Habets EAP, Naylor PA, 2016, Theory and Applications of Spherical Microphone Array Processing, Publisher: Springer, ISBN: 9783319422114

This book presents the signal processing algorithms that have been developed to process the signals acquired by a spherical microphone array.


Moore AH, Naylor P, Linear prediction based dereverberation for spherical microphone arrays, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

Dereverberation is an important preprocessing step in manyspeech systems, both for human and machine listening. Inmany situations, including robot audition, the sound sourcesof interest can be incident from any direction. In such circumstances,a spherical microphone array allows direction of arrivalestimation which is free of spatial aliasing and directionindependentbeam patterns can be formed. This contributionformulates the Weighted Prediction Error algorithm in thespherical harmonic domain and compares the performance toa space domain implementation. Simulation results demonstratethat performing dereverberation in the spherical harmonicdomain allows many more microphones to be usedwithout increasing the computational cost. The benefit ofusing many microphones is particularly apparent at low signalto noise ratios, where for the conditions tested up to 71%improvement in speech-to-reverberation modulation ratio wasachieved.

Conference paper

Eaton DJ, Gaubitch ND, Moore AH, Naylor PAet al., 2016, Estimation of room acoustic parameters: the ACE challenge, IEEE Transactions on Audio Speech and Language Processing, Vol: 24, Pages: 1681-1693, ISSN: 2329-9290

Reverberation Time (T60) and Direct-to-Reverberant Ratio (DRR) are important parameters which together can characterize sound captured by microphones in non-anechoic rooms. These parameters are important in speech processing applications such as speech recognition and dereverberation. The values of T60 and DRR can be estimated directly from the Acoustic Impulse Response (AIR) of the room. In practice, the AIR isnot normally available, in which case these parameters must be estimated blindly from the observed speech in the microphone signal. The Acoustic Characterization of Environments (ACE) Challenge aimed to determine the state-of-the-art in blind acoustic parameter estimation and also to stimulate research in this area. A summary of the ACE Challenge, and the corpusused in the challenge is presented together with an analysis of the results. Existing algorithms were submitted alongside novel contributions, the comparative results for which are presented in this paper. The challenge showed that T60 estimation is a mature field where analytical approaches dominate whilst DRR estimation is a less mature field where machine learning approaches are currently more successful.

Journal article

Eaton DJ, Moore AH, Naylor PA, Skoglund Jet al., 2016, Reverberation estimator, US20160118038 A1

Provided are methods and systems for generating Direct-to-Reverberant Ratio (DRR) estimates. The methods and systems use a null-steered beamformer to produce accurate DRR estimates across a variety of room sizes, reverberation times, and source-receiver distances. The DRR estimation algorithm uses spatial selectivity to separate direct and reverberant energy and account for noise separately. The formulation considers the response of the beamformer to reverberant sound and the effect of noise. The DRR estimation algorithm is more robust to background noise than existing approaches, and is applicable where a signal is recorded with two or more microphones, such as with mobile communications devices, laptop computers, and the like.


Sharma D, Naylor PA, Wang Y, Brookes DMet al., 2016, A Data-Driven Non-intrusive Measure of Speech Quality and Intelligibility, Speech Communication, Vol: 80, Pages: 84-94, ISSN: 0167-6393

Speech signals are often affected by additive noiseand distortion which can degrade the perceived quality andintelligibility of the signal. We present a new measure, NISA, forestimating the quality and intelligibility of speech degraded byadditive noise and distortions associated with telecommunicationsnetworks, based on a data driven framework of feature extractionand tree based regression. The new measure is non-intrusive,operating on the degraded signal alone without the need for areference signal. This makes the measure applicable to practicalspeech processing applications operating in the single-endedmode. The new measure has been evaluated against the intrusivemeasures PESQ and STOI. The results indicate that the accuracyof the new non-intrusive method is around 90% of the accuracy ofthe intrusive measures, depending on the test scenario. The NISAmeasure therefore provides non-intrusive (single-ended) PESQand STOI estimates with high accuracy.

Journal article

Javed HA, Moore AH, Naylor PA, 2016, Spherical microphone array acoustic rake receivers, ICASSP, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 111-115, ISSN: 0736-7791

Several signal independent acoustic rake receivers are proposed for speech dereverberation using spherical microphone arrays. The proposed rake designs take advantage of multipaths, by separately capturing and combining early reflections with the direct path. We investigate several approaches in combining reflections with the direct path source signal, including the development of beam patterns that point nulls at all preceding reflections. The proposed designs are tested in experimental simulations and their dereverberation performances evaluated using objective measures. For the tested configuration, the proposed designs achieve higher levels of dereverberation compared to conventional signal independent beamforming systems; achieving up to 3.6 dB improvement in the direct-to-reverberant ratio over the plane-wave decomposition beamformer.

Conference paper

Evers C, Moore AH, Naylor PA, 2016, Acoustic simultaneous localization and mapping (A-SLAM) of a moving microphone array and its surrounding speakers, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 6-10, ISSN: 1520-6149

Acoustic scene mapping creates a representation of positions of audio sources such as talkers within the surrounding environment of a microphone array. By allowing the array to move, the acoustic scene can be explored in order to improve the map. Furthermore, the spatial diversity of the kinematic array allows for estimation of the source-sensor distance in scenarios where source directions of arrival are measured. As sound source localization is performed relative to the array position, mapping of acoustic sources requires knowledge of the absolute position of the microphone array in the room. If the array is moving, its absolute position is unknown in practice. Hence, Simultaneous Localization and Mapping (SLAM) is required in order to localize the microphone array position and map the surrounding sound sources. In realistic environments, microphone arrays receive a convolutive mixture of direct-path speech signals, noise and reflections due to reverberation. A key challenge of Acoustic SLAM (a-SLAM) is robustness against reverberant clutter measurements and missing source detections. This paper proposes a novel bearing-only a-SLAM approach using a Single-Cluster Probability Hypothesis Density filter. Results demonstrate convergence to accurate estimates of the array trajectory and source positions.

Conference paper

Neeld T, Eaton J, Naylor PA, Shipworth Det al., 2016, A novel method of determining events in combination gas boilers: Assessing the feasibility of a passive acoustic sensor, Building and Environment, Vol: 100, Pages: 1-9, ISSN: 0360-1323

To assess the impact of interventions designed to reduce residential space heating demand, investigators must be armed with field-trial applicable techniques that accurately measure space heating energy use. This study assesses the feasibility of using a passive acoustic sensor to detect gas consumption events in domestic combination gas-fired boilers (C-GFBs). The investigation has shown, for the C-GFB investigated, the following events are discernible using a passive acoustic sensor: demand type (hot water or central heating); boiler ignition time; and pre-mix fan motor speed. A detection algorithm was developed to automatically identify demand type and burner ignition time with accuracies of 100% and 97% respectfully. Demand type was determined by training a naive Bayes classifier on 20 features of the acoustic profile at the start of a demand event. Burner ignition was determined by detecting low frequency (5–10 Hz) pressure pulsations produced during ignition. The acoustic signatures of the pre-mix fan and circulation-pump were identified manually. Additional work is required to detect burner duration, deal with detection in the presence of increased noise and expand the range of boilers investigated. There are considerable implications resulting from the widespread use of such techniques on improving understanding of space heating demand.

Journal article

Doire CSJ, Brookes DM, Naylor PA, De Sena E, van Waterschoot T, Jensen SHJet al., Acoustic Environment Control: Implementation of a Reverberation Enhancement System, AES 60th International Conference: DREAMS (Dereverberation and Reverberation of Audio, Music, and Speech)

Reverberation enhancement systems allow the active control of the acoustic environment. They are subject to instability issues due to acoustic feedback, and are often installed permanently in large halls, sometimes at great cost. In this paper, we explore the possibility of implementing a cost-effective reverberation enhancement system to control the acoustics of typical rooms using a combination of spatial filtering, automatic calibration, adaptive notch filters, howling detection and manual adjustments. The effectiveness of the system is then tested inside a small soundproof booth.

Conference paper

Parada PP, Sharma D, Lainez J, Barreda D, van Waterschoot T, Naylor PAet al., 2016, A single-channel non-intrusive C50 estimator correlated with speech recognition performance, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 24, Pages: 719-732, ISSN: 2329-9304

Journal article

Evers C, Moore A, Naylor P, Towards Informative Path Planning for Acoustic SLAM, DAGA 2016

Acoustic scene mapping is a challenging task as microphonearrays can often localize sound sources only interms of their directions. Spatial diversity can be exploitedconstructively to infer source-sensor range whenusing microphone arrays installed on moving platforms,such as robots. As the absolute location of a moving robotis often unknown in practice, Acoustic SimultaneousLocalization And Mapping (a-SLAM) is required in orderto localize the moving robot’s positions and jointlymap the sound sources. Using a novel a-SLAM approach,this paper investigates the impact of the choice of robotpaths on source mapping accuracy. Simulation results demonstratethat a-SLAM performance can be improved byinformatively planning robot paths.

Conference paper

Cauchi B, Javed H, Gerkmann T, Doclo S, Goetze S, Naylor Pet al., 2016, PERCEPTUAL AND INSTRUMENTAL EVALUATION OF THE PERCEIVED LEVEL OF REVERBERATION, IEEE International Conference on Acoustics, Speech, and Signal Processing, Publisher: IEEE, Pages: 629-633, ISSN: 1520-6149

Conference paper

Zhang W, Naylor PA, He Z, Zhang Yet al., 2016, ON THE EVALUATION OF MULTICHANNEL BLIND SYSTEM IDENTIFICATION FROM THE VIEWPOINT OF SYSTEM EQUALIZATION, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

Conference paper

Zahedi A, Ostergaard J, Jensen SH, Naylor P, Bech Set al., 2016, On Perceptual Audio Compression with Side Information at the Decoder, Data Compression Conference (DCC), Publisher: IEEE, Pages: 456-465, ISSN: 1068-0314

Conference paper


Conference paper

Hafezi S, Moore AH, Naylor PA, 2016, 3D ACOUSTIC SOURCE LOCALIZATION IN THE SPHERICAL HARMONIC DOMAIN BASED ON OPTIMIZED GRID SEARCH, IEEE International Conference on Acoustics, Speech, and Signal Processing, Publisher: IEEE, Pages: 415-419, ISSN: 1520-6149

Conference paper

Javed HA, Moore AH, Naylor PA, 2016, SPHERICAL HARMONIC RAKE RECEIVERS FOR DEREVERBERATION, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

Conference paper

Eaton J, Gaubitch ND, Moore AH, Naylor PAet al., 2015, Proceedings of the ACE Challenge Workshop, a satellite event of IEEE-WASPAA

Several established parameters and metrics have been used to characterize the acoustics of a room. The most important are the Direct-To-Reverberant Ratio (DRR), the Reverberation Time (T60) and the reflection coefficient. The acoustic characteristics of a room based on such parameters can be used to predict the quality and intelligibility of speech signals in that room. Recently, several important methods in speech enhancement and speech recognition have been developed that show an increase in performance compared to the predecessors but do require knowledge of one or more fundamental acoustical parameters such as the T60. Traditionally, these parameters have been estimated using carefully measured Acoustic Impulse Responses (AIRs). However, in most applications it is not practical or even possible to measure the acoustic impulse response. Consequently, there is increasing research activity in the estimation of such parameters directly from speech and audio signals. The aim of this challenge was to evaluate state-of-the-art algorithms for blind acoustic parameter estimation from speech and to promote the emerging area of research in this field. Participants evaluated their algorithms for T60 and DRR estimation against the ’ground truth’ values provided with the data-sets and presented the results in a paper describing the method used.


Eaton J, Gaubitch ND, Moore AH, Naylor PAet al., 2015, The ACE Challenge - corpus description and performance evaluation, WASPAA, Publisher: IEEE

Knowledge of the Direct-to-Reverberant Ratio (DRR) and Reverberation Time (T60) can be used to better perform speech and audio processing such as dereverberation. Established methods compute these parameters from measured Acoustic Impulse Responses (AIRs). However, in many practical situations the AIR is not available and the parameters must be estimated non-intrusively directly from noisy speech or audio signals. The Acoustic Characterization of Environments (ACE) Challenge is a competition to identify the most promising non-intrusive DRR and T60 estimation methods using real noisy reverberant speech. We describe the ACE corpus comprising multi-channel AIRs, and multi-channel noise including ambient, fan and babble noise recorded in the same environment as the measured AIRs, along with the corresponding DRR and T60 measurements. The evaluation methodology is discussed and comparative results are shown.

Conference paper

Eaton J, Naylor PA, 2015, Direct-to-Reverberant ratio estimation on the ACE corpus using a Two-channel beamformer, arXiv, ACE Challenge Workshop, a satellite event of IEEE-WASPAA, Publisher: arXiv

Direct-to-Reverberant Ratio (DRR) is an important measure for characterizing the properties of a room. The recently proposed DRR Estimation using a Null-Steered Beamformer (DENBE) algorithm was originally tested on simulated data where noise was artificially added to the speech after convolution with impulse responses simulated using the image-source method. This paper evaluates the performance of this algorithm on speech convolved with measured impulse responses and noise using the Acoustic Characterization of Environments (ACE) Evaluation corpus. The fullband DRR estimation performance of the DENBE algorithm exceeds that of the baselines in all Signal-to-Noise Ratios (SNRs) and noise types. In addition, estimation of the DRR in one third-octave ISO frequency bands is demonstrated.

Conference paper

Eaton J, Naylor PA, 2015, Reverberation time estimation on the ACE corpus using the SDD method, arXiv, ACE Challenge Workshop, a satellite event of IEEE-WASPAA, Publisher: arXiv

Reverberation Time ($T_60$) is an important measure for characterizing the properties of a room. The author’s $T_60$ estimation algorithm was previously tested on simulated data where the noise is artificially added to the speech after convolution with a impulse responses simulated using the image method. We test the algorithm on speech convolved with real recorded impulse responses and noise from the same rooms from the Acoustic Characterization of Environments (ACE) corpus and achieve results comparable results to those using simulated data.

Conference paper

Moore AH, Evers C, Naylor PA, Alon DL, Rafaely Bet al., 2015, Direction of arrival estimation using pseudo-intensity vectors with direct-path dominance test, European Signal Processing Conference, Publisher: IEEE, Pages: 2296-2300, ISSN: 2219-5491

The accuracy of direction of arrival estimation tends to degrade under reverberant conditions due to the presence of reflected signal components which are correlated with the direct path. The recently proposed direct-path dominance test provides a means of identifying time-frequency regions in which a single signal path is dominant. By analysing only these regions it was shown that the accuracy of the FS-MUSIC algorithm could be significantly improved. However, for real-time implementation a less computationally demanding localisation algorithm would be preferable. In the present contribution we investigate the direct-path dominance test as a preprocessing step to pseudo-intensity vector-based localisation. A novel formulation of the pseudo-intensity vector is proposed which further exploits the direct path dominance test and leads to improved localisation performance.

Conference paper

Doire C, Brookes D, Naylor P, Jensen SHet al., 2015, Data-Driven Statistical Modelling of Room Impulse Responses in the Power Domain, European Signal Processing Conference (EUSIPCO), Publisher: IEEE

Having an accurate statistical model of room impulse responses with a minimum number of parameters is of crucial importance in applications such as dereverberation. In this paper, by taking into account the behaviour of the early reflections, we extend the widely-used statistical model proposed by Polack. The squared room impulse response is modelled in each frequency band as the realisation of a stochastic process weighted by the sum of two exponential decays. Room-independent values for the new parameters involved are obtained through analysis of several room impulse response databases, and validation of the model in the likelihood sense is performed.

Conference paper

Hu M, Doclo S, Sharma D, Brookes D, Naylor Pet al., 2015, Noise Robust Blind System Identification Algorithms Based On A Rayleigh Quotient Cost Function, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 2476-2480

An important prerequisite for acoustic multi-channel equalization for speech dereverberation involves the identification of the acoustic channels between the source and the microphones. Blind System Identification (BSI) algorithms based on cross-relation error minimization are known to mis-converge in the presence of noise. Although algorithms have been proposed in the literature to improve robustness to noise, the estimated room impulse responses are usually constrained to have a flat magnitude spectrum. In this paper, noise robust algorithms based on a Rayleigh quotient cost function are proposed. Unlike the traditional algorithms, the estimated impulse responses are not always forced to have unit norm. Experimental results using simulated room impulse responses and several SNRs show that one of the proposed algorithms outperforms competing algorithms in terms of normalized projection misalignment.

Conference paper

Moore AH, Evers C, Naylor PA, 2015, Multichannel equalisation for high-order spherical microphone arrays using beamformed channels, 2015 IEEE International Conference on Digital Signal Processing (DSP), Publisher: IEEE, Pages: 1211-1215, ISSN: 1546-1874

High-order spherical microphone arrays offer many practical benefits including relatively fine spatial resolution in all directions and rotation invariant processing using eigenbeams. Spatial filtering can reduce interference from noise and reverberation but in even moderately reverberant environments the beam pattern fails to suppress reverberation to a level adequate for typical applications. In this paper we investigate the feasibility of applying dereverberation by considering multiple beamformer outputs as channels to be dereverberated. In one realisation we process directly in the spherical harmonic domain where the beampatterns are mutually orthogonal. In a second realisation, which is not limited to spherical microphone arrays, beams are pointed in the direction of dominant reflections. Simulations demonstrate that in both cases reverberation is significantly reduced and, in the best case, clarity index is improved by 15 dB.

Conference paper

Evers C, Moore AH, Naylor PA, Sheaffer J, Rafaely Bet al., 2015, Bearing-only acoustic tracking of moving speakers for robot audition, 2015 IEEE International Conference on Digital Signal Processing (DSP), Publisher: IEEE, Pages: 1206-1210, ISSN: 1546-1874

This paper focuses on speaker tracking in robot audition for human-robot interaction. Using only acoustic signals, speaker tracking in enclosed spaces is subject to missing detections and spurious clutter measurements due to speech inactivity, reverberation and interference. Furthermore, many acoustic localization approaches estimate speaker direction, hence providing bearing-only measurements without range information. This paper presents a probability hypothesis density (PHD) tracker that augments the bearing-only speaker directions of arrival with a cloud of range hypotheses at speaker initiation and propagates the random variates through time. Furthermore, due to their formulation PHD filters explicitly model, and hence provide robustness against, clutter and missing detections. The approach is verified using experimental results.

Conference paper

Parada PP, Sharma D, Naylor PA, van Waterschoot Tet al., 2015, Reverberant speech recognition exploiting clarity index estimation, Eurasip Journal on Advances in Signal Processing, Vol: 2016, Pages: 1-12, ISSN: 1687-6180

We present single-channel approaches to robust automatic speech recognition (ASR) in reverberant environments based on non-intrusive estimation of the clarity index (C 50). Our best performing method includes the estimated value of C 50 in the ASR feature vector and also uses C 50 to select the most suitable ASR acoustic model according to the reverberation level. We evaluate our method on the REVERB Challenge database employing two different C 50 estimators and show that our method outperforms the best baseline of the challenge achieved without unsupervised acoustic model adaptation, i.e. using multi-condition hidden Markov models (HMMs). Our approach achieves a 22.4 % relative word error rate reduction in comparison to the best baseline of the challenge.

Journal article

Eaton J, Moore AH, Naylor PA, Skoglund Jet al., 2015, Direct-to-reverberant ratio estimation using a null-steered beamformer, ICASSP, Publisher: IEEE, Pages: 46-50

Reverberation affects the quality and intelligibility of distant speech recorded in a room. Direct-to-Reverberant Ratio (DRR) is a useful measure for assessing the acoustic configuration and can be used to inform dereverberation algorithms. We describe a novel DRR estimation algorithm applicable where the signal was recorded with two or more microphones, such as mobile communications devices and laptops. The method uses a null-steered beamformer. In simulations the proposed method yields accurate DRR estimates to within +/- 4 dB across a across a wide variety of room sizes, reverberation times and source-receiver distances. It is also shown that the proposed method is more robust to background noise than a baseline approach. The best estimation accuracy is obtained in the region from -5 to 5 dB which is a relevant range for portable devices.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: id=00004259&limit=30&person=true&page=3&respub-action=search.html