Publications

Conference paper

Yiallourides C, Manning V, Moore AH, Naylor Pet al., 2017,

A dynamic programming approach for automatic stride detection and segmentation in acoustic emission from the knee

, 2017 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 401-405, ISSN: 1520-6149

We study the acquisition and analysis of sounds generated by the knee during walking with particular focus on the effects due to osteoarthritis. Reliable contact instant estimation is essential for stride synchronous analysis. We present a dynamic programming based algorithm for automatic estimation of both the initial contact instants (ICIs) and last contact instants (LCIs) of the foot to the floor. The technique is designed for acoustic signals sensed at the patella of the knee. It uses the phase-slope function to generate a set of candidates and then finds the most likely ones by minimizing a cost function that we define. ICIs are identified with an RMS error of 13.0% for healthy and 14.6% for osteoarthritic knees and LCIs with an RMS error of 16.0% and 17.0% respectively.

Conference paper

Lightburn L, De Sena E, Moore AH, Naylor PA, Brookes Det al., 2017,

Improving the perceptual quality of ideal binary masked speech

, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 661-665, ISSN: 1520-6149

It is known that applying a time-frequency binary mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a binary mask that combines the intelligibility gains of conventional binary masking with the perceptual quality gains of a classical speech enhancer. The binary mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the mask is used to supply prior information to a classical speech enhancer about the probability of speech presence in different time-frequency regions. Using an oracle ideal binary mask, we show that the proposed method results in a higher predicted quality than other methods of applying a binary mask whilst preserving the improvements in predicted intelligibility.

Conference paper

Hafezi S, Moore AH, Naylor P, 2017,

Multiple source localization using estimation consistency in the time-frequency domain

, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 516-520, ISSN: 1520-6149

The extraction of multiple Direction-of-Arrival (DoA) information from estimated spatial spectra can be challenging when such spectra are noisy or the sources are adjacent. Smoothing or clustering techniques are typically used to remove the effect of noise or irregular peaks in the spatial spectra. As we will explain and show in this paper, the smoothing-based techniques require prior knowledge of minimum angular separation of the sources and the clustering-based techniques fail on noisy spatial spectrum. A broad class of localization techniques give direction estimates in each Time Frequency (TF) bin. Using this information as input, a novel technique for obtaining robust localization of multiple simultaneous sources is proposed using Estimation Consistency (EC) in the TF domain. The method is evaluated in the context of spherical microphone arrays. This technique does not require prior knowledge of the sources and by removing the noise in the estimated spatial spectrum makes clustering a reliable and robust technique for multiple DoA extraction from estimated spatial spectra. The results indicate that the proposed technique has the strongest robustness to separation with up to 10° median error for 5° to 180° separation for 2 and 3 sources, compared to the baseline and the state-of-the-art techniques.

Conference paper

Moore AH, Brookes D, Naylor PA, 2017,

Robust spherical harmonic domain interpolation of spatially sampled array manifolds

, IEEE International Conference on Acoustics Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 521-525, ISSN: 1520-6149

Accurate interpolation of the array manifold is an important firststep for the acoustic simulation of rapidly moving microphone ar-rays. Spherical harmonic domain interpolation has been proposedand well studied in the context of head-related transfer functions buthas focussed on perceptual, rather than numerical, accuracy. In thispaper we analyze the effect of measurement noise on spatial aliasing.Based on this analysis we propose a method for selecting the trunca-tion orders for the forward and reverse spherical Fourier transformsgiven only the noisy samples in such a way that the interpolationerror is minimized. The proposed method achieves up to 1.7 dB im-provement over the baseline approach.

Conference paper

Papayiannis C, Evers C, Naylor PA, 2017,

Discriminative feature domains for reverberant acoustic environments

, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, ISSN: 2379-190X

Several speech processing and audio data-mining applicationsrely on a description of the acoustic environment as a featurevector for classification. The discriminative properties of thefeature domain play a crucial role in the effectiveness of thesemethods. In this work, we consider three environment iden-tification tasks and the task of acoustic model selection forspeech recognition. A set of acoustic parameters and Ma-chine Learning algorithms for feature selection are used andan analysis is performed on the resulting feature domains foreach task. In our experiments, a classification accuracy of100% is achieved for the majority of tasks and the Word Er-ror Rate is reduced by 20.73 percentage points for AutomaticSpeech Recognition when using the resulting domains. Ex-perimental results indicate a significant dissimilarity in theparameter choices for the composition of the domains, whichhighlights the importance of the feature selection process forindividual applications.

Conference paper

Evers C, Dorfan Y, Gannot S, Naylor PAet al., 2017,

Source tracking using moving microphone arrays for robot audition

, IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE

Intuitive spoken dialogues are a prerequisite for human-robot inter-action. In many practical situations, robots must be able to identifyand focus on sources of interest in the presence of interfering speak-ers. Techniques such as spatial filtering and blind source separa-tion are therefore often used, but rely on accurate knowledge of thesource location. In practice, sound emitted in enclosed environmentsis subject to reverberation and noise. Hence, sound source localiza-tion must be robust to both diffuse noise due to late reverberation, aswell as spurious detections due to early reflections. For improvedrobustness against reverberation, this paper proposes a novel ap-proach for sound source tracking that constructively exploits the spa-tial diversity of a microphone array installed in a moving robot. Inprevious work, we developed speaker localization approaches usingexpectation-maximization (EM) approaches and using Bayesian ap-proaches. In this paper we propose to combine the EM and Bayesianapproach in one framework for improved robustness against rever-beration and noise.

Conference paper

Xue W, Brookes M, Naylor PA, 2017,

Frequency-domain under-modelled blind system identification based on cross power spectrum and sparsity regularization

, IEEE International Conference on Acoustics, Speech and Signal Processing, Pages: 591-595, ISSN: 1520-6149

© 2017 IEEE. In room acoustics, under-modelled multichannel blind system identification (BSI) aims to estimate the early part of the room impulse responses (RIRs), and it can be widely used in applications such as speaker localization, room geometry identification and beamforming based speech dereverberation. In this paper we extend our recent study on under-modelled BSI from the time domain to the frequency domain, such that the RIRs can be updated frame-wise and the efficiency of Fast Fourier Transform (FFT) is exploited to reduce the computational complexity. Analogous to the cross-correlation based criterion in the time domain, a frequency-domain cross power spectrum based criterion is proposed. As the early RIRs are usually sparse, the RIRs are estimated by jointly maximizing the cross power spectrum based criterion in the frequency domain and minimizing the l 1 -norm sparsity measure in the time domain. A two-stage LMS updating algorithm is derived to achieve joint optimization of these two targets. The experimental results in different under-modelled scenarios demonstrate the effectiveness of the proposed method.

Abstract
Cite

Conference paper

Eaton DJ, javed HA, Naylor PA, 2017,

Estimation of the perceived level of reverberation using non-intrusive single-channel variance of decay rates

, Joint Workshop on Hands-free Speech Communication and Microphone Arrays, Publisher: IEEE

The increasing processing power of hearing aids and mobile deviceshas led to the potential for incorporation of dereverberation algorithms to improve speech quality for the listener. Assessing the effectiveness of deverberation algorithms using subjective listening tests is extremely time consuming and depends on averaging out listener variations over a large number of subjects. Also, most existing instrumental measures are intrusive and require knowledge of the original signal which precludes many practical applications. In this paper we show that the proposed non-intrusive single-channel algorithm is a predictor of the perceived level of reverberation thatcorrelates well with subjective listening test results, outperforming many existing intrusive and non-intrusive measures. The algorithm requires only a single training step and has a very low computational complexity making it suitable for hearing aids and mobile telephone applications. The source code has been made freely available.

Conference paper

Löllmann HW, Moore AH, Naylor PA, Rafaely B, Horaud R, Mazel A, Kellermann Wet al., 2017,

Microphone array signal processing for robot audition

, 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA), Publisher: IEEE, Pages: 51-55

Robot audition for humanoid robots interacting naturally with humans in an unconstrained real-world environment is a hitherto unsolved challenge. The recorded microphone signals are usually distorted by background and interfering noise sources (speakers) as well as room reverberation. In addition, the movements of a robot and its actuators cause ego-noise which degrades the recorded signals significantly. The movement of the robot body and its head also complicates the detection and tracking of the desired, possibly moving, sound sources of interest. This paper presents an overview of the concepts in microphone array processing for robot audition and some recent achievements.

Conference paper

Hafezi S, Moore AH, Naylor PA, 2017,

Multi-source estimation consistency for improved multiple direction-of-arrival estimation

, Joint Workshop on Hands-free Speech Communication and Microphone Arrays, Publisher: IEEE, Pages: 81-85

In Direction-of-Arrival (DOA) estimation for multiple sources, removal of noisy data points from a set of local DOA estimates increases the resulting estimation accuracy, especially when there are many sources and they have small angular separation. In this work, we propose a post-processing technique for the enhancement of DOA extraction from a set of local estimates using the consistency of these estimates within the time frame based on adaptive multi-source assumption. Simulations in a realistic reverberant environment with sensor noise and up to 5 sources demonstrate that the proposed technique outperforms the baseline and state-of-the-art approaches. In these tests the proposed technique had the worst average error of 9°, robustness of 5° to widely varying source separation and 3° to number of sources.

Conference paper

Gebru ID, Evers C, Naylor PA, Horaud Ret al., 2017,

Audio-visual tracking by density approximation in a sequential Bayesian filtering framework

, HSCMA 2017, Publisher: IEEE, Pages: 71-75

This paper proposes a novel audio-visual tracking approach that exploits constructively audio and visual modalities in order to estimate trajectories of multiple people in a joint state space. The tracking problem is modeled using a sequential Bayesian filtering framework. Within this framework, we propose to represent the posterior density with a Gaussian Mixture Model (GMM). To ensure that a GMM representation can be retained sequentially over time, the predictive density is approximated by a GMM using the Unscented Transform. While a density interpolation technique is introduced to obtain a continuous representation of the observation likelihood, which is also a GMM. Furthermore, to prevent the number of mixtures from growing exponentially over time, a density approximation based on the Expectation Maximization (EM) algorithm is applied, resulting in a compact GMM representation of the posterior density. Recordings using a camcorder and microphone array are used to evaluate the proposed approach, demonstrating significant improvements in tracking performance of the proposed audio-visual approach compared to two benchmark visual trackers.

Journal article

Doire CSJ, Brookes DM, Naylor PA, 2017,

Robust and efficient Bayesian adaptive psychometric function estimation

, Journal of the Acoustical Society of America, Vol: 141, Pages: 2501-2512, ISSN: 0001-4966

The efficient measurement of the threshold and slope of the psychometric function (PF) is an important objective in psychoacoustics. This paper proposes a procedure that combines a Bayesian estimate of the PF with either a look one-ahead or a look two-ahead method of selecting the next stimulus presentation. The procedure differs from previously proposed algorithms in two respects: (i) it does not require the range of possible PF parameters to be specified in advance and (ii) the sequence of probe signal-to-noise ratios optimizes the threshold and slope estimates at a performance level, ϕ, that can be chosen by the experimenter. Simulation results show that the proposed procedure is robust and that the estimates of both threshold and slope have a consistently low bias. Over a wide range of listener PF parameters, the root-mean-square errors after 50 trials were ∼1.2 dB in threshold and 0.14 in log-slope. It was found that the performance differences between the look one-ahead and look two-ahead methods were negligible and that an entropy-based criterion for selecting the next stimulus was preferred to a variance-based criterion.

Conference paper

Pinero G, Naylor PA, 2017,

CHANNEL ESTIMATION FOR CROSSTALK CANCELLATION IN WlRELESS ACOUSTIC NETWORKS

, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 586-590, ISSN: 1520-6149

Conference paper

Javed HA, Cauchi B, Doclo S, Naylor PA, Goetze Set al., 2017,

MEASURING, MODELLING AND PREDICTING PERCEIVED REVERBERATION

, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 381-385, ISSN: 1520-6149

Journal article

Doire CSJ, Brookes DM, Naylor PA, Hicks CM, Betts D, Dmour MA, Jensen SHet al., 2017,

Single-channel online enhancement of speech corrupted by reverberation and noise

, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 25, Pages: 572-587, ISSN: 2329-9290

This paper proposes an online single-channel speech enhancement method designed to improve the quality of speech degraded by reverberation and noise. Based on an autoregressive model for the reverberation power and on a hidden Markov model for clean speech production, a Bayesian filtering formulation of the problem is derived and online joint estimation of the acoustic parameters and mean speech, reverberation, and noise powers is obtained in mel-frequency bands. From these estimates, a real-valued spectral gain is derived and spectral enhancement is applied in the short-time Fourier transform (STFT) domain. The method yields state-of-the-art performance and greatly reduces the effects of reverberation and noise while improving speech quality and preserving speech intelligibility in challenging acoustic environments.

Journal article

Parada PP, Sharma D, van Waterschoot T, Naylor PAet al., 2017,

Confidence Measures for Nonintrusive Estimation of Speech Clarity Index

, JOURNAL OF THE AUDIO ENGINEERING SOCIETY, Vol: 65, Pages: 90-99, ISSN: 1549-4950

Conference paper

Evers C, Rafaely B, Naylor PA, 2017,

Speaker tracking in reverberant environments using multiple detections of arrival

, HSCMA 2017, Publisher: IEEE

Accurate estimation of the Direction of Arrival (DOA) of a soundsource is an important prerequisite for a wide range of acoustic sig-nal processing applications. However, in enclosed environments,early reflections and late reverberation often lead to localization er-rors. Recent work demonstrated that improved robustness againstreverberation can be achieved by clustering only the DOAs fromdirect-path bins in the short-term Fourier transform of a speech sig-nal of several seconds duration from a static talker. Nevertheless, formoving talkers, short blocks of at most several hundred millisecondsare required to capture the spatio-temporal variation of the sourcedirection. Processing of short blocks of data in reverberant envi-ronment can lead to clusters whose centroids correspond to spuri-ous DOAs away from the source direction. We therefore propose inthis paper a novel multi-detection source tracking approach that es-timates the smoothed trajectory of the source DOAs. Results for re-alistic room simulations validate the proposed approach and demon-strate significant improvements in estimation accuracy compared tosingle-detection tracking.

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017,

Introduction

, Springer Topics in Signal Processing, Pages: 1-10

The motivation behind this book lies in the rapidly growing interest in spherical microphone arrays over the last decade. Important applications for these arrays include human-human and human-machine speech communication systems and spatial sound recording. While human-human speech communication systems have a long history, speech also plays an ever-growing part in human-machine communication. This trend has been fuelled by advances in speech recognition technology, as well as the explosion in available computing power, particularly on mobile devices. With the widespread availability of 3D sound cinema systems and virtual reality gear with 3D binaural sound reproduction, the need to capture spatial sound is rapidly growing. Spherical microphone arrays are particularly suitable for capturing all three dimensions of the sound field, including both ambient sounds and sounds from particular directions. In this chapter, we introduce the topic of acoustic signal processing using microphone arrays, and then explore spherical microphone arrays in more detail. We provide an outline of the structure of the book, and discuss the relationships between each of the subsequent chapters.

Abstract
Cite
Citations: 3

Book

Jarrett DP, Habets EAP, Naylor PA, 2017,

Preface

Cite

Book chapter

Jarrett DP, Habets EAP, Naylor PA, 2017,

Signal-Dependent Array Processing

, THEORY AND APPLICATIONS OF SPHERICAL MICROPHONE ARRAY PROCESSING, Publisher: SPRINGER-VERLAG BERLIN, Pages: 113-139, ISBN: 978-3-319-42209-1

Author Web Link
Cite
Citations: 1

A dynamic programming approach for automatic stride detection and segmentation in acoustic emission from the knee

Improving the perceptual quality of ideal binary masked speech

Multiple source localization using estimation consistency in the time-frequency domain

Robust spherical harmonic domain interpolation of spatially sampled array manifolds

Discriminative feature domains for reverberant acoustic environments

Source tracking using moving microphone arrays for robot audition

Frequency-domain under-modelled blind system identification based on cross power spectrum and sparsity regularization

Estimation of the perceived level of reverberation using non-intrusive single-channel variance of decay rates

Microphone array signal processing for robot audition

Multi-source estimation consistency for improved multiple direction-of-arrival estimation

Audio-visual tracking by density approximation in a sequential Bayesian filtering framework

Robust and efficient Bayesian adaptive psychometric function estimation

CHANNEL ESTIMATION FOR CROSSTALK CANCELLATION IN WlRELESS ACOUSTIC NETWORKS

MEASURING, MODELLING AND PREDICTING PERCEIVED REVERBERATION

Single-channel online enhancement of speech corrupted by reverberation and noise

Confidence Measures for Nonintrusive Estimation of Speech Clarity Index

Speaker tracking in reverberant environments using multiple detections of arrival

Introduction

Preface

Signal-Dependent Array Processing

Contact us

Address

Email

Members only

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

CHANNEL ESTIMATION FOR CROSSTALK CANCELLATION IN WlRELESS ACOUSTIC NETWORKS

MEASURING, MODELLING AND PREDICTING PERCEIVED REVERBERATION

Preface

Contact us

Address

Email

Members only