Publications

Jarrett DP, Habets EAP, Naylor PA, 2017, Theory and Applications of Spherical Microphone Array Processing Preface, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: V-VI, ISBN: 978-3-319-42209-1

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017, Parametric Array Processing, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 141-150, ISBN: 978-3-319-42209-1

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017, Theory and Applications of Spherical Microphone Array Processing Introduction, Publisher: SPRINGER-VERLAG BERLIN, ISBN: 978-3-319-42209-1

Book

Jarrett DP, Habets EAP, Naylor PA, 2017, Acoustic Parameter Estimation, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 65-92, ISBN: 978-3-319-42209-1

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017, Informed Array Processing, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 151-184, ISBN: 978-3-319-42209-1

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017, Spatial Sampling and Signal Transformation, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 23-37, ISBN: 978-3-319-42209-1

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017, Signal-Independent Array Processing, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 93-112, ISBN: 978-3-319-42209-1

Author Web Link
Cite
Citations: 2

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017, Theoretical Preliminaries of Acoustics, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 11-22, ISBN: 978-3-319-42209-1

Author Web Link
Cite
Citations: 1

Book chapter

Moore AH, Peso P, Naylor PA, 2016, Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures, Computer Speech and Language, Vol: 46, Pages: 574-584, ISSN: 1095-8363

Automatic speech recognition in everyday environments must be robust to significant levels of reverberation andnoise. One strategy to achieve such robustness is multi-microphone speech enhancement. In this study, we presentresults of an evaluation of different speech enhancement pipelines using a state-of-the-artASRsystem for a widerange of reverberation and noise conditions. The evaluation exploits the recently released ACE Challenge databasewhich includes measured multichannel acoustic impulse responses from 7 different rooms with reverberation timesranging from 0.33 s to 1.34 s. The reverberant speech is mixed with ambient, fan and babble noise recordings madewith the same microphone setups in each of the rooms. In the first experiment performance of theASRwithoutspeech processing is evaluated. Results clearly indicate the deleterious effect of both noise and reverberation. In thesecond experiment, different speech enhancement pipelines are evaluated with relative word error rate reductions ofup to 82%. Finally, the ability of selected instrumental metrics to predictASRperformance improvement is assessed.The best performing metric, Short-Time Objective Intelligibility Measure, is shown to have a Pearson correlationcoefficient of 0.79, suggesting that it is a useful predictor of algorithm performance in these tests.

Journal article

Dorfan Y, Evers C, Gannot S, Naylor Pet al., 2016, Speaker Localization with Moving Microphone Arrays, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465

Speaker localization algorithms often assume staticlocation for all sensors. This assumption simplifies the modelsused, since all acoustic transfer functions are linear time invariant.In many applications this assumption is not valid. Inthis paper we address the localization challenge with movingmicrophone arrays. We propose two algorithms to find thespeaker position. The first approach is a batch algorithm basedon the maximum likelihood criterion, optimized via expectationmaximizationiterations. The second approach is a particle filterfor sequential Bayesian estimation. The performance of bothapproaches is evaluated and compared for simulated reverberantaudio data from a microphone array with two sensors.

Conference paper

Hafezi S, Moore AH, Naylor PA, 2016, Multiple source localization in the spherical harmonic domain using augmented intensity vectors based on grid search, European Signal Processing Conference, Publisher: IEEE, ISSN: 2219-5491

Multiple source localization is an important task in acousticsignal processing with applications including dereverberation,source separation, source tracking and environmentmapping. When using spherical microphone arrays, it hasbeen previously shown that Pseudo-intensity Vectors (PIV),and Augmented Intensity Vectors (AIV), are an effective approachfor direction of arrival estimation of a sound source.In this paper, we evaluate AIV-based localization in acousticscenarios involving multiple sound sources. Simulations areconducted where the number of sources, their angular separationand the reverberation time of the room are varied. Theresults indicate that AIV outperforms PIV and Steered ResponsePower (SRP) with an average accuracy between 5 and10 degrees for sources with angular separation of 30 degreesor more. AIV also shows better robustness to reverberationtime than PIV and SRP.

Conference paper

Moore AH, Evers C, Naylor PA, 2016, 2D direction of arrival estimation of multiple moving sources using a spherical microphone array, European Signal Processing Conference, Publisher: IEEE, ISSN: 2219-5491

Direction of arrival estimation using a spherical microphonearray is an important and growing research area. One promisingalgorithm is the recently proposed Subspace PseudoIntensityVector method. In this contribution the SubspacePseudo-Intensity Vector method is combined with a state-ofthe-artmethod for robustly estimating the centres of mass in a2D histogram based on matching pursuits. The performanceof the improved Subspace Pseudo-Intensity Vector method isevaluated in the context of localising multiple moving sourceswhere it is shown to outperform competing methods in termsof clutter rate and the number of missed detections whilstremaining comparable in terms of localisation accuracy.

Conference paper

Evers C, Moore A, Naylor P, 2016, Localization of Moving Microphone Arrays from Moving Sound Sources for Robot Audition, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465

Acoustic Simultaneous Localization and Mapping(a-SLAM) jointly localizes the trajectory of a microphone arrayinstalled on a moving platform, whilst estimating the acousticmap of surrounding sound sources, such as human speakers.Whilst traditional approaches for SLAM in the vision and opticalresearch literature rely on the assumption that the surroundingmap features are static, in the acoustic case the positions oftalkers are usually time-varying due to head rotations and bodymovements. This paper demonstrates that tracking of movingsources can be incorporated in a-SLAM by modelling the acousticmap as a Random Finite Set (RFS) of multiple sources andexplicitly imposing models of the source dynamics. The proposedapproach is verified and its performance evaluated for realisticsimulated data.

Conference paper

Xue W, Brookes DM, Naylor PA, 2016, Under-modelled blind system identification for time delay estimation in reverberant environments, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

In multichannel systems, acoustic time delay estimation (TDE) is a challenging problem in reverberant environments. Although blind system identification (BSI) based methods have been proposed which utilize a realistic signal model for the room impulse response (RIR), their TDE performance depends strongly on that of the BSI, which is often inaccurate in practice when the identified responses are under-modelled. In this paper, we propose a new under-modelled BSI based method for TDE in reverberant environments. An under-modelled BSI algorithm is derived, which is based on maximizing the cross-correlation of the cross-filtered signals rather than minimizing the cross-relation error, and also exploits the sparsity of the early part of the RIR. For TDE, this new criterion can be viewed as a generalization of conventional cross-correlation-based TDE methods by considering a more realistic model for the early RIR. Depending on the microphone spacing, only a short early part of each RIR is identified, and the time delays are estimated based on the peak locations in the identified early RIRs. Experiments in different reverberant environments with speech source signals demonstrate the effectiveness of the proposed method.

Conference paper

Moore AH, Evers C, Naylor PA, 2016, Direction of Arrival Estimation in the Spherical Harmonic Domain using Subspace Pseudo-Intensity Vectors, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol: 25, Pages: 178-192, ISSN: 2329-9290

Direction of Arrival (DOA) estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented which operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses Pseudo-Intensity Vectors (PIVs) and works well in acoustic environments where only one sound source is active at any time. The second uses Subspace Pseudo-Intensity Vectors (SSPIVs) and is targeted at environments where multiple simultaneous sources and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state-of-the-art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated using speech recordings in real acoustic environments.

Journal article

Moore AH, Naylor P, 2016, Linear prediction based dereverberation for spherical microphone arrays, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

Dereverberation is an important preprocessing step in manyspeech systems, both for human and machine listening. Inmany situations, including robot audition, the sound sourcesof interest can be incident from any direction. In such circumstances,a spherical microphone array allows direction of arrivalestimation which is free of spatial aliasing and directionindependentbeam patterns can be formed. This contributionformulates the Weighted Prediction Error algorithm in thespherical harmonic domain and compares the performance toa space domain implementation. Simulation results demonstratethat performing dereverberation in the spherical harmonicdomain allows many more microphones to be usedwithout increasing the computational cost. The benefit ofusing many microphones is particularly apparent at low signalto noise ratios, where for the conditions tested up to 71%improvement in speech-to-reverberation modulation ratio wasachieved.

Conference paper

Naylor PA, Zahedi A, Jensen S, Bech Set al., 2016, Source Coding in Networks with Covariance Distortion Constraints, IEEE Transactions on Signal Processing, Vol: 64, Pages: 5943-5958, ISSN: 1053-587X

We consider a source coding problem with a networkscenario in mind, and formulate it as a remote vectorGaussian Wyner-Ziv problem under covariance matrix distortions.We define a notion of minimum for two positive-definitematrices based on which we derive an explicit formula for therate-distortion function (RDF). We then study the special casesand applications of this result. We show that two well-studiedsource coding problems, i.e. remote vector Gaussian Wyner-Ziv problems with mean-squared error and mutual informationconstraints are in fact special cases of our results. Finally,we apply our results to a joint source coding and denoisingproblem. We consider a network with a centralized topology anda given weighted sum-rate constraint, where the received signalsat the center are to be fused to maximize the output SNR whileenforcing no linear distortion. We show that one can design thedistortion matrices at the nodes in order to maximize the outputSNR at the fusion center. We thereby bridge between denoisingand source coding within this setup.

Journal article

Xue W, Brookes M, Naylor PA, 2016, Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization, 24th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 718-722, ISSN: 2076-1465

Conference paper

Eaton DJ, Gaubitch ND, Moore AH, Naylor PAet al., 2016, Estimation of room acoustic parameters: the ACE challenge, IEEE Transactions on Audio Speech and Language Processing, Vol: 24, Pages: 1681-1693, ISSN: 2329-9290

Reverberation Time (T60) and Direct-to-Reverberant Ratio (DRR) are important parameters which together can characterize sound captured by microphones in non-anechoic rooms. These parameters are important in speech processing applications such as speech recognition and dereverberation. The values of T60 and DRR can be estimated directly from the Acoustic Impulse Response (AIR) of the room. In practice, the AIR isnot normally available, in which case these parameters must be estimated blindly from the observed speech in the microphone signal. The Acoustic Characterization of Environments (ACE) Challenge aimed to determine the state-of-the-art in blind acoustic parameter estimation and also to stimulate research in this area. A summary of the ACE Challenge, and the corpusused in the challenge is presented together with an analysis of the results. Existing algorithms were submitted alongside novel contributions, the comparative results for which are presented in this paper. The challenge showed that T60 estimation is a mature field where analytical approaches dominate whilst DRR estimation is a less mature field where machine learning approaches are currently more successful.

Journal article

Eaton DJ, Moore AH, Naylor PA, Skoglund Jet al., 2016, Reverberation estimator, US20160118038 A1

Provided are methods and systems for generating Direct-to-Reverberant Ratio (DRR) estimates. The methods and systems use a null-steered beamformer to produce accurate DRR estimates across a variety of room sizes, reverberation times, and source-receiver distances. The DRR estimation algorithm uses spatial selectivity to separate direct and reverberant energy and account for noise separately. The formulation considers the response of the beamformer to reverberant sound and the effect of noise. The DRR estimation algorithm is more robust to background noise than existing approaches, and is applicable where a signal is recorded with two or more microphones, such as with mobile communications devices, laptop computers, and the like.

Patent

Sharma D, Naylor PA, Wang Y, Brookes DMet al., 2016, A Data-Driven Non-intrusive Measure of Speech Quality and Intelligibility, Speech Communication, Vol: 80, Pages: 84-94, ISSN: 0167-6393

Speech signals are often affected by additive noiseand distortion which can degrade the perceived quality andintelligibility of the signal. We present a new measure, NISA, forestimating the quality and intelligibility of speech degraded byadditive noise and distortions associated with telecommunicationsnetworks, based on a data driven framework of feature extractionand tree based regression. The new measure is non-intrusive,operating on the degraded signal alone without the need for areference signal. This makes the measure applicable to practicalspeech processing applications operating in the single-endedmode. The new measure has been evaluated against the intrusivemeasures PESQ and STOI. The results indicate that the accuracyof the new non-intrusive method is around 90% of the accuracy ofthe intrusive measures, depending on the test scenario. The NISAmeasure therefore provides non-intrusive (single-ended) PESQand STOI estimates with high accuracy.

Journal article

Javed HA, Moore AH, Naylor PA, 2016, Spherical microphone array acoustic rake receivers, ICASSP, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 111-115, ISSN: 0736-7791

Several signal independent acoustic rake receivers are proposed for speech dereverberation using spherical microphone arrays. The proposed rake designs take advantage of multipaths, by separately capturing and combining early reflections with the direct path. We investigate several approaches in combining reflections with the direct path source signal, including the development of beam patterns that point nulls at all preceding reflections. The proposed designs are tested in experimental simulations and their dereverberation performances evaluated using objective measures. For the tested configuration, the proposed designs achieve higher levels of dereverberation compared to conventional signal independent beamforming systems; achieving up to 3.6 dB improvement in the direct-to-reverberant ratio over the plane-wave decomposition beamformer.

Conference paper

Evers C, Moore AH, Naylor PA, 2016, Acoustic simultaneous localization and mapping (A-SLAM) of a moving microphone array and its surrounding speakers, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 6-10, ISSN: 1520-6149

Acoustic scene mapping creates a representation of positions of audio sources such as talkers within the surrounding environment of a microphone array. By allowing the array to move, the acoustic scene can be explored in order to improve the map. Furthermore, the spatial diversity of the kinematic array allows for estimation of the source-sensor distance in scenarios where source directions of arrival are measured. As sound source localization is performed relative to the array position, mapping of acoustic sources requires knowledge of the absolute position of the microphone array in the room. If the array is moving, its absolute position is unknown in practice. Hence, Simultaneous Localization and Mapping (SLAM) is required in order to localize the microphone array position and map the surrounding sound sources. In realistic environments, microphone arrays receive a convolutive mixture of direct-path speech signals, noise and reflections due to reverberation. A key challenge of Acoustic SLAM (a-SLAM) is robustness against reverberant clutter measurements and missing source detections. This paper proposes a novel bearing-only a-SLAM approach using a Single-Cluster Probability Hypothesis Density filter. Results demonstrate convergence to accurate estimates of the array trajectory and source positions.

Conference paper

Neeld T, Eaton J, Naylor PA, Shipworth Det al., 2016, A novel method of determining events in combination gas boilers: Assessing the feasibility of a passive acoustic sensor, Building and Environment, Vol: 100, Pages: 1-9, ISSN: 0360-1323

To assess the impact of interventions designed to reduce residential space heating demand, investigators must be armed with field-trial applicable techniques that accurately measure space heating energy use. This study assesses the feasibility of using a passive acoustic sensor to detect gas consumption events in domestic combination gas-fired boilers (C-GFBs). The investigation has shown, for the C-GFB investigated, the following events are discernible using a passive acoustic sensor: demand type (hot water or central heating); boiler ignition time; and pre-mix fan motor speed. A detection algorithm was developed to automatically identify demand type and burner ignition time with accuracies of 100% and 97% respectfully. Demand type was determined by training a naive Bayes classifier on 20 features of the acoustic profile at the start of a demand event. Burner ignition was determined by detecting low frequency (5–10 Hz) pressure pulsations produced during ignition. The acoustic signatures of the pre-mix fan and circulation-pump were identified manually. Additional work is required to detect burner duration, deal with detection in the presence of increased noise and expand the range of boilers investigated. There are considerable implications resulting from the widespread use of such techniques on improving understanding of space heating demand.

Journal article

Doire CSJ, Brookes DM, Naylor PA, De Sena E, van Waterschoot T, Jensen SHJet al., 2016, Acoustic Environment Control: Implementation of a Reverberation Enhancement System, AES 60th International Conference: DREAMS (Dereverberation and Reverberation of Audio, Music, and Speech)

Reverberation enhancement systems allow the active control of the acoustic environment. They are subject to instability issues due to acoustic feedback, and are often installed permanently in large halls, sometimes at great cost. In this paper, we explore the possibility of implementing a cost-effective reverberation enhancement system to control the acoustics of typical rooms using a combination of spatial filtering, automatic calibration, adaptive notch filters, howling detection and manual adjustments. The effectiveness of the system is then tested inside a small soundproof booth.

Abstract
Cite

Conference paper

Parada PP, Sharma D, Lainez J, Barreda D, van Waterschoot T, Naylor PAet al., 2016, A single-channel non-intrusive C50 estimator correlated with speech recognition performance, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 24, Pages: 719-732, ISSN: 2329-9304

Journal article

Evers C, Moore A, Naylor P, 2016, Towards Informative Path Planning for Acoustic SLAM, DAGA 2016

Acoustic scene mapping is a challenging task as microphonearrays can often localize sound sources only interms of their directions. Spatial diversity can be exploitedconstructively to infer source-sensor range whenusing microphone arrays installed on moving platforms,such as robots. As the absolute location of a moving robotis often unknown in practice, Acoustic SimultaneousLocalization And Mapping (a-SLAM) is required in orderto localize the moving robot’s positions and jointlymap the sound sources. Using a novel a-SLAM approach,this paper investigates the impact of the choice of robotpaths on source mapping accuracy. Simulation results demonstratethat a-SLAM performance can be improved byinformatively planning robot paths.

Conference paper

Cauchi B, Santos JF, Siedenburg K, Falk TH, Naylor PA, Doclo S, Goetze Set al., 2016, Predicting the quality of processed speech by combining modulation-based features and model trees, Pages: 180-184

Many signal processing methods have been proposed to improve the quality of speech recorded in the presence of noise and reverberation. The evaluation of these methods either requires the use of perceptual measures, i.e. listening tests, or instrumental measures. Perceptual measures are typically more reliable but are quite costly and time-consuming. On the other hand, instrumental measures may correlate poorly with the perceived speech quality. In this paper we propose to train an instrumental measure, combining modulation-based features and model trees, on the basis of perceptual scores obtained on a small corpus of speech data that has been processed by a combination of beamforming and spectral postfiltering. For evaluation purposes the resulting measure is then applied to a larger corpus. Results show that the use of model trees to train the predicting function of an instrumental measure increases its correlation with perceptual scores.

Abstract
Cite
Citations: 1

Conference paper

Hafezi S, Moore AH, Naylor PA, 2016, 3D ACOUSTIC SOURCE LOCALIZATION IN THE SPHERICAL HARMONIC DOMAIN BASED ON OPTIMIZED GRID SEARCH, 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 415-419, ISSN: 1520-6149

Author Web Link
Cite
Citations: 11

Conference paper

Zahedi A, Ostergaard J, Jensen SH, Naylor P, Bech Set al., 2016, On Perceptual Audio Compression with Side Information at the Decoder, Data Compression Conference (DCC), Publisher: IEEE, Pages: 456-465, ISSN: 1068-0314

Conference paper

Patrick A. Naylor

Contact

Location

Summary