Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

  • Conference paper
    Hafezi S, Moore AH, Naylor P, 2017,

    Multiple source localization using estimation consistency in the time-frequency domain

    , 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 516-520, ISSN: 1520-6149

    The extraction of multiple Direction-of-Arrival (DoA) information from estimated spatial spectra can be challenging when such spectra are noisy or the sources are adjacent. Smoothing or clustering techniques are typically used to remove the effect of noise or irregular peaks in the spatial spectra. As we will explain and show in this paper, the smoothing-based techniques require prior knowledge of minimum angular separation of the sources and the clustering-based techniques fail on noisy spatial spectrum. A broad class of localization techniques give direction estimates in each Time Frequency (TF) bin. Using this information as input, a novel technique for obtaining robust localization of multiple simultaneous sources is proposed using Estimation Consistency (EC) in the TF domain. The method is evaluated in the context of spherical microphone arrays. This technique does not require prior knowledge of the sources and by removing the noise in the estimated spatial spectrum makes clustering a reliable and robust technique for multiple DoA extraction from estimated spatial spectra. The results indicate that the proposed technique has the strongest robustness to separation with up to 10° median error for 5° to 180° separation for 2 and 3 sources, compared to the baseline and the state-of-the-art techniques.

  • Conference paper
    Moore AH, Brookes D, Naylor PA, 2017,

    Robust spherical harmonic domain interpolation of spatially sampled array manifolds

    , IEEE International Conference on Acoustics Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 521-525, ISSN: 1520-6149

    Accurate interpolation of the array manifold is an important firststep for the acoustic simulation of rapidly moving microphone ar-rays. Spherical harmonic domain interpolation has been proposedand well studied in the context of head-related transfer functions buthas focussed on perceptual, rather than numerical, accuracy. In thispaper we analyze the effect of measurement noise on spatial aliasing.Based on this analysis we propose a method for selecting the trunca-tion orders for the forward and reverse spherical Fourier transformsgiven only the noisy samples in such a way that the interpolationerror is minimized. The proposed method achieves up to 1.7 dB im-provement over the baseline approach.

  • Conference paper
    Papayiannis C, Evers C, Naylor PA, 2017,

    Discriminative feature domains for reverberant acoustic environments

    , 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, ISSN: 2379-190X

    Several speech processing and audio data-mining applicationsrely on a description of the acoustic environment as a featurevector for classification. The discriminative properties of thefeature domain play a crucial role in the effectiveness of thesemethods. In this work, we consider three environment iden-tification tasks and the task of acoustic model selection forspeech recognition. A set of acoustic parameters and Ma-chine Learning algorithms for feature selection are used andan analysis is performed on the resulting feature domains foreach task. In our experiments, a classification accuracy of100% is achieved for the majority of tasks and the Word Er-ror Rate is reduced by 20.73 percentage points for AutomaticSpeech Recognition when using the resulting domains. Ex-perimental results indicate a significant dissimilarity in theparameter choices for the composition of the domains, whichhighlights the importance of the feature selection process forindividual applications.

  • Conference paper
    Evers C, Dorfan Y, Gannot S, Naylor PAet al., 2017,

    Source tracking using moving microphone arrays for robot audition

    , IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE

    Intuitive spoken dialogues are a prerequisite for human-robot inter-action. In many practical situations, robots must be able to identifyand focus on sources of interest in the presence of interfering speak-ers. Techniques such as spatial filtering and blind source separa-tion are therefore often used, but rely on accurate knowledge of thesource location. In practice, sound emitted in enclosed environmentsis subject to reverberation and noise. Hence, sound source localiza-tion must be robust to both diffuse noise due to late reverberation, aswell as spurious detections due to early reflections. For improvedrobustness against reverberation, this paper proposes a novel ap-proach for sound source tracking that constructively exploits the spa-tial diversity of a microphone array installed in a moving robot. Inprevious work, we developed speaker localization approaches usingexpectation-maximization (EM) approaches and using Bayesian ap-proaches. In this paper we propose to combine the EM and Bayesianapproach in one framework for improved robustness against rever-beration and noise.

  • Conference paper
    Xue W, Brookes M, Naylor PA, 2017,

    Frequency-domain under-modelled blind system identification based on cross power spectrum and sparsity regularization

    , IEEE International Conference on Acoustics, Speech and Signal Processing, Pages: 591-595, ISSN: 1520-6149

    © 2017 IEEE. In room acoustics, under-modelled multichannel blind system identification (BSI) aims to estimate the early part of the room impulse responses (RIRs), and it can be widely used in applications such as speaker localization, room geometry identification and beamforming based speech dereverberation. In this paper we extend our recent study on under-modelled BSI from the time domain to the frequency domain, such that the RIRs can be updated frame-wise and the efficiency of Fast Fourier Transform (FFT) is exploited to reduce the computational complexity. Analogous to the cross-correlation based criterion in the time domain, a frequency-domain cross power spectrum based criterion is proposed. As the early RIRs are usually sparse, the RIRs are estimated by jointly maximizing the cross power spectrum based criterion in the frequency domain and minimizing the l 1 -norm sparsity measure in the time domain. A two-stage LMS updating algorithm is derived to achieve joint optimization of these two targets. The experimental results in different under-modelled scenarios demonstrate the effectiveness of the proposed method.

  • Conference paper
    Eaton DJ, javed HA, Naylor PA, 2017,

    Estimation of the perceived level of reverberation using non-intrusive single-channel variance of decay rates

    , Joint Workshop on Hands-free Speech Communication and Microphone Arrays, Publisher: IEEE

    The increasing processing power of hearing aids and mobile deviceshas led to the potential for incorporation of dereverberation algorithms to improve speech quality for the listener. Assessing the effectiveness of deverberation algorithms using subjective listening tests is extremely time consuming and depends on averaging out listener variations over a large number of subjects. Also, most existing instrumental measures are intrusive and require knowledge of the original signal which precludes many practical applications. In this paper we show that the proposed non-intrusive single-channel algorithm is a predictor of the perceived level of reverberation thatcorrelates well with subjective listening test results, outperforming many existing intrusive and non-intrusive measures. The algorithm requires only a single training step and has a very low computational complexity making it suitable for hearing aids and mobile telephone applications. The source code has been made freely available.

  • Conference paper
    Hafezi S, Moore AH, Naylor PA, 2017,

    Multi-source estimation consistency for improved multiple direction-of-arrival estimation

    , Joint Workshop on Hands-free Speech Communication and Microphone Arrays, Publisher: IEEE, Pages: 81-85

    In Direction-of-Arrival (DOA) estimation for multiple sources, removal of noisy data points from a set of local DOA estimates increases the resulting estimation accuracy, especially when there are many sources and they have small angular separation. In this work, we propose a post-processing technique for the enhancement of DOA extraction from a set of local estimates using the consistency of these estimates within the time frame based on adaptive multi-source assumption. Simulations in a realistic reverberant environment with sensor noise and up to 5 sources demonstrate that the proposed technique outperforms the baseline and state-of-the-art approaches. In these tests the proposed technique had the worst average error of 9°, robustness of 5° to widely varying source separation and 3° to number of sources.

  • Conference paper
    Löllmann HW, Moore AH, Naylor PA, Rafaely B, Horaud R, Mazel A, Kellermann Wet al., 2017,

    Microphone array signal processing for robot audition

    , 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA), Publisher: IEEE, Pages: 51-55

    Robot audition for humanoid robots interacting naturally with humans in an unconstrained real-world environment is a hitherto unsolved challenge. The recorded microphone signals are usually distorted by background and interfering noise sources (speakers) as well as room reverberation. In addition, the movements of a robot and its actuators cause ego-noise which degrades the recorded signals significantly. The movement of the robot body and its head also complicates the detection and tracking of the desired, possibly moving, sound sources of interest. This paper presents an overview of the concepts in microphone array processing for robot audition and some recent achievements.

  • Conference paper
    Gebru ID, Evers C, Naylor PA, Horaud Ret al., 2017,

    Audio-visual tracking by density approximation in a sequential Bayesian filtering framework

    , HSCMA 2017, Publisher: IEEE, Pages: 71-75

    This paper proposes a novel audio-visual tracking approach that exploits constructively audio and visual modalities in order to estimate trajectories of multiple people in a joint state space. The tracking problem is modeled using a sequential Bayesian filtering framework. Within this framework, we propose to represent the posterior density with a Gaussian Mixture Model (GMM). To ensure that a GMM representation can be retained sequentially over time, the predictive density is approximated by a GMM using the Unscented Transform. While a density interpolation technique is introduced to obtain a continuous representation of the observation likelihood, which is also a GMM. Furthermore, to prevent the number of mixtures from growing exponentially over time, a density approximation based on the Expectation Maximization (EM) algorithm is applied, resulting in a compact GMM representation of the posterior density. Recordings using a camcorder and microphone array are used to evaluate the proposed approach, demonstrating significant improvements in tracking performance of the proposed audio-visual approach compared to two benchmark visual trackers.

  • Journal article
    Doire CSJ, Brookes DM, Naylor PA, 2017,

    Robust and efficient Bayesian adaptive psychometric function estimation

    , Journal of the Acoustical Society of America, Vol: 141, Pages: 2501-2512, ISSN: 0001-4966

    The efficient measurement of the threshold and slope of the psychometric function (PF) is an important objective in psychoacoustics. This paper proposes a procedure that combines a Bayesian estimate of the PF with either a look one-ahead or a look two-ahead method of selecting the next stimulus presentation. The procedure differs from previously proposed algorithms in two respects: (i) it does not require the range of possible PF parameters to be specified in advance and (ii) the sequence of probe signal-to-noise ratios optimizes the threshold and slope estimates at a performance level, ϕ, that can be chosen by the experimenter. Simulation results show that the proposed procedure is robust and that the estimates of both threshold and slope have a consistently low bias. Over a wide range of listener PF parameters, the root-mean-square errors after 50 trials were ∼1.2 dB in threshold and 0.14 in log-slope. It was found that the performance differences between the look one-ahead and look two-ahead methods were negligible and that an entropy-based criterion for selecting the next stimulus was preferred to a variance-based criterion.

  • Conference paper
    Javed HA, Cauchi B, Doclo S, Naylor PA, Goetze Set al., 2017,


    , IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 381-385, ISSN: 1520-6149
  • Conference paper
    Pinero G, Naylor PA, 2017,


    , IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 586-590, ISSN: 1520-6149
  • Journal article
    Doire CSJ, Brookes DM, Naylor PA, Hicks CM, Betts D, Dmour MA, Jensen SHet al., 2017,

    Single-channel online enhancement of speech corrupted by reverberation and noise

    , IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 25, Pages: 572-587, ISSN: 2329-9290

    This paper proposes an online single-channel speech enhancement method designed to improve the quality of speech degraded by reverberation and noise. Based on an autoregressive model for the reverberation power and on a hidden Markov model for clean speech production, a Bayesian filtering formulation of the problem is derived and online joint estimation of the acoustic parameters and mean speech, reverberation, and noise powers is obtained in mel-frequency bands. From these estimates, a real-valued spectral gain is derived and spectral enhancement is applied in the short-time Fourier transform (STFT) domain. The method yields state-of-the-art performance and greatly reduces the effects of reverberation and noise while improving speech quality and preserving speech intelligibility in challenging acoustic environments.

  • Journal article
    Parada PP, Sharma D, van Waterschoot T, Naylor PAet al., 2017,

    Confidence Measures for Nonintrusive Estimation of Speech Clarity Index

    , JOURNAL OF THE AUDIO ENGINEERING SOCIETY, Vol: 65, Pages: 90-99, ISSN: 1549-4950
  • Conference paper
    Evers C, Rafaely B, Naylor PA, 2017,

    Speaker tracking in reverberant environments using multiple detections of arrival

    , HSCMA 2017, Publisher: IEEE

    Accurate estimation of the Direction of Arrival (DOA) of a soundsource is an important prerequisite for a wide range of acoustic sig-nal processing applications. However, in enclosed environments,early reflections and late reverberation often lead to localization er-rors. Recent work demonstrated that improved robustness againstreverberation can be achieved by clustering only the DOAs fromdirect-path bins in the short-term Fourier transform of a speech sig-nal of several seconds duration from a static talker. Nevertheless, formoving talkers, short blocks of at most several hundred millisecondsare required to capture the spatio-temporal variation of the sourcedirection. Processing of short blocks of data in reverberant envi-ronment can lead to clusters whose centroids correspond to spuri-ous DOAs away from the source direction. We therefore propose inthis paper a novel multi-detection source tracking approach that es-timates the smoothed trajectory of the source DOAs. Results for re-alistic room simulations validate the proposed approach and demon-strate significant improvements in estimation accuracy compared tosingle-detection tracking.

  • Book chapter
    Jarrett DP, Habets EAP, Naylor PA, 2017,

    Spatial sampling and signal transformation

    , Springer Topics in Signal Processing, Pages: 23-37

    This chapter examines issues relating to spatial signal acquisition and transformation. Many publications that propose algorithms for parameter estimation or signal enhancement purposes begin from the outset using signals in either or both of the time-frequency and spherical harmonic domains. One aim of the current chapter is to provide some of the algorithmic details necessary to process signals directly from the microphones, which will then enable subsequent spherical harmonic domain processing to be applied. In addition, we present a number of spatial sampling schemes, which determine the placement of microphones on the sphere such that spatial aliasing is minimized, and we discuss the advantages and disadvantages of two common array types: the open and rigid arrays with omnidirectional microphones.

  • Book chapter
    Jarrett DP, Habets EAP, Naylor PA, 2017,

    Informed array processing

    , Springer Topics in Signal Processing, Pages: 151-184

    The concept of informed array processing is introduced in this chapter. The conceptual aim of informed array processing is to incorporate relevant spatial information about the problem to be solved into the design of spatial filters and into the estimation of the second-order statistics that are required to implement the beamformers of Chap. 7. Informed array processing techniques are developed for two important signal enhancement problems: noise reduction and dereverberation.

  • Book chapter
    Jarrett DP, Habets EAP, Naylor PA, 2017,

    Acoustic parameter estimation

    , Springer Topics in Signal Processing, Pages: 65-92

    This chapter introduces methods for the estimation of two important acoustic parameters using spherical microphone arrays: the direction of arrival of sound from a localized sound source, and the signal-to-diffuse energy ratio at a particular position in a sound field. Later in the book, it will be seen that these quantities can be used for signal enhancement purposes.

  • Book chapter
    Jarrett DP, Habets EAP, Naylor PA, 2017,

    Theoretical preliminaries of acoustics

    , Springer Topics in Signal Processing, Pages: 11-22

    In this chapter, we review some of the fundamentals of acoustics and introduce the spherical harmonic expansion of a sound field, which is the basis for the spherical harmonic processing framework used with spherical microphone arrays. This material provides an introduction to the key theory and equations that are required in the rest of the book.

  • Book
    Jarrett DP, Habets EAP, Naylor PA, 2017,


This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=1226&limit=20&page=4&respub-action=search.html Current Millis: 1635355473697 Current Time: Wed Oct 27 18:24:33 BST 2021