Publications

Grinstein E, Hicks CM, Van Waterschoot T, Brookes M, Naylor PAet al., 2024, The Neural-SRP Method for Universal Robust Multi-Source Tracking, IEEE Open Journal of Signal Processing, Vol: 5, Pages: 19-28

Neural networks have achieved state-of-the-art performance on the task of acoustic Direction-of-Arrival (DOA) estimation using microphone arrays. Neural models can be classified as end-to-end or hybrid, each class showing advantages and disadvantages. This work introduces Neural-SRP, an end-to-end neural network architecture for DOA estimation inspired by the classical Steered Response Power (SRP) method, which overcomes limitations of current neural models. We evaluate the architecture on multiple scenarios, namely, multi-source DOA tracking and single-source DOA tracking under the presence of directional and diffuse noise. The experiments demonstrate that our proposed method compares favourably in terms of computational and localization performance with established neural methods on various recorded and simulated benchmark datasets.

Abstract
Cite

Journal article

Guiraud P, Moore AH, Vos RR, Naylor PA, Brookes Met al., 2023, Using a single-channel reference with the MBSTOI binaural intelligibility metric, Speech Communication, Vol: 149, Pages: 74-83, ISSN: 0167-6393

In order to assess the intelligibility of a target signal in a noisy environment, intrusive speech intelligibility metrics are typically used. They require a clean reference signal to be available which can be difficult to obtain especially for binaural metrics like the modified binaural short time objective intelligibility metric (MBSTOI). We here present a hybrid version of MBSTOI that incorporates a deep learning stage that allows the metric to be computed with only a single-channel clean reference signal. The models presented are trained on simulated data containing target speech, localised noise, diffuse noise, and reverberation. The hybrid output metrics are then compared directly to MBSTOI to assess performances. Results show the performance of our single channel reference vs MBSTOI. The outcome of this work offers a fast and flexible way to generate audio data for machine learning (ML) and highlights the potential for low level implementation of ML into existing tools.

Journal article

Gudnason J, Fang G, Brookes M, 2023, Epoch-Based Spectrum Estimation for Speech, Pages: 4274-4278, ISSN: 2308-457X

An implicit assumption when using the discrete Fourier transform for spectrum estimation is that the time signal is periodic. This assumption clashes with the quasi-periodicity of voiced speech when the traditional short-time Fourier transform (STFT) is applied to it. This causes distortion and leads to a performance handicap in downstream processing. This work proposes a remedy to this by using epochs in the signal to determine better frame boundaries for the Fourier transform. The epochs are the estimated glottal closure instants in voiced speech and significant peaks in the unvoiced speech signal. The resulting coefficients are compared to the traditional STFT coefficients using copy-synthesis. An improvement of 15 dB signal-to-noise ratio and a PESQ score of 2.5 to 3.5 is achieved for copy-synthesis using 20 mel-filters. The results demonstrate that there is a great potential in improving down stream speech processing applications using this approach to spectrum estimation.

Abstract
Cite

Conference paper

Sathyapriyan V, Pedersen MS, Brookes M, Østergaard J, Naylor PA, Jensen Jet al., 2023, Speech enhancement using binary estimator selection applied to hearing aids with a remote microphone, Pages: 38-42

This paper introduces a speech enhancement algorithm for hearing assistive devices, e.g., hearing aids, connected to a remote microphone. Remote microphones are especially beneficial to hearing aid users when they are present in environments with low signal-to-noise ratios. The transmission of the acoustic data from the remote microphone to the hearing aid unit, however, happens through a wireless channel that is prone to network delays. Such delays, that occur in any real-world application, make the remote microphone signal less valuable, in contrast to when the transmission is assumed to be error-free and instantaneous, as is often done in the literature. To make use of the remote microphone signal, despite the delay, we propose an estimator selection method that selects between the minimum mean-square error estimate of the desired signal, made using the hearing aid signals and the delayed remote microphone signal, respectively. This binary selection is made by comparing the normalized mean-square errors of the two desired signal estimates. We show that the proposed method provides a benefit in estimated speech intelligibility, for delays in transmission up to 30 ms at a signal-to-noise ratio of 0 dB, in comparison to the minimum mean-square error estimate made using only the hearing aid microphone signals.

Abstract
Cite

Conference paper

Guiraud P, Moore AH, Vos RR, Naylor PA, Brookes Met al., 2023, The MBSTOI Binaural Intelligibility Metric Using a Close-Talking Microphone Reference, ISSN: 1520-6149

Intelligibility metrics are a fast way to determine how comprehensible a target signal is in a noisy situation. Most metrics however rely on having a clean reference signal for computation and are not adapted to live recordings. In this paper the deep correlation modified binaural short time objective intelligibility metric (Dcor-MBSTOI) is evaluated with a single-channel close-talking microphone signal as the reference. This reference signal inevitably contains some background noise and crosstalk from non-target sources. It is found that intelligibility is overestimated when using the close-talking microphone signal directly but that this overestimation can be eliminated by applying speech enhancement to the reference signal.

Abstract
Cite

Conference paper

Grinstein E, Brookes M, Naylor PA, 2023, Graph Neural Networks for Sound Source Localization on Distributed Microphone Networks, ISSN: 1520-6149

Distributed Microphone Arrays (DMAs) present many challenges with respect to centralized microphone arrays. An important requirement of applications on these arrays is handling a variable number of input channels. We consider the use of Graph Neural Networks (GNNs) as a solution to this challenge. We present a localization method using the Relation Network GNN, which we show shares many similarities to classical signal processing algorithms for Sound Source Localization (SSL). We apply our method for the task of SSL and validate it experimentally using an unseen number of microphones. We test different feature extractors and show that our approach significantly outperforms classical baselines.

Abstract
Cite
Citations: 2

Conference paper

Tokala V, Brookes M, Naylor P, 2022, Binaural speech enhancement using STOI-optimal masks, International Workshop on Acoustic Signal Enhancement (IWAENC) 2022, Publisher: IEEE, Pages: 1-5

STOI-optimal masking has been previously proposed and developed for single-channel speech enhancement. In this paper, we consider the extension to the task of binaural speech enhancement in which the spatial information is known to be important to speech understanding and therefore should bepreserved by the enhancement processing. Masks are estimated for each of the binaural channels individually and a ‘better-ear listening’ mask is computed by choosing the maximum of the two masks. The estimated mask is used to supply probability information about the speech presence in eachtime-frequency bin to an Optimally-modified Log Spectral Amplitude (OM-LSA) enhancer. We show that using the pro-posed method for binaural signals with a directional noise not only improves the SNR of the noisy signal but also preserves the binaural cues and intelligibility.

Conference paper

Moore AH, Green T, Brookes DM, Naylor PAet al., 2022, Measuring audio-visual speech intelligibility under dynamic listening conditions using virtual reality, AES 2022 International Audio for Virtual and Augmented Reality Conference, Publisher: Audio Engineering Society (AES), Pages: 1-8

The ELOSPHERES project is a collaboration between researchers at Imperial College London and University College London which aims to improve the efficacy of hearing aids. The benefit obtained from hearing aids varies significantly between listeners and listening environments. The noisy, reverberant environments which most people find challenging bear little resemblance to the clinics in which consultations occur. In order to make progress in speech enhancement, algorithms need to be evaluated under realistic listening conditions. A key aim of ELOSPHERES is to create a virtual reality-based test environment in which alternative speech enhancement algorithms can be evaluated using a listener-in-the-loop paradigm. In this paper we present the sap-elospheres-audiovisual-test (SEAT) platform and report the results of an initial experiment in which it was used to measure the benefit of visual cues in a speech intelligibility in spatial noise task.

Conference paper

H Moore A, Hafezi S, R Vos R, A Naylor P, Brookes Met al., 2022, A compact noise covariance matrix model for MVDR beamforming, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol: 30, Pages: 2049-2061, ISSN: 2329-9290

Acoustic beamforming is routinely used to improve the SNR of the received signal in applications such as hearing aids, robot audition, augmented reality, teleconferencing, source localisation and source tracking. The beamformer can be made adaptive by using an estimate of the time-varying noise covariance matrix in the spectral domain to determine an optimised beam pattern in each frequency bin that is specific to the acoustic environment and that can respond to temporal changes in it. However, robust estimation of the noise covariance matrix remains a challenging task especially in non-stationary acoustic environments. This paper presents a compact model of the signal covariance matrix that is defined by a small number of parameters whose values can be reliably estimated. The model leads to a robust estimate of the noise covariance matrix which can, in turn, be used to construct a beamformer. The performance of beamformers designed using this approach is evaluated for a spherical microphone array under a range of conditions using both simulated and measured room impulse responses. The proposed approach demonstrates consistent gains in intelligibility and perceptual quality metrics compared to the static and adaptive beamformers used as baselines.

Journal article

Green T, Hilkhuysen G, Huckvale M, Rosen S, Brookes M, Moore A, Naylor P, Lightburn L, Xue Wet al., 2022, Speech recognition with a hearing-aid processing scheme combining beamforming with mask-informed speech enhancement, Trends in Hearing, Vol: 26, Pages: 1-16, ISSN: 2331-2165

A signal processing approach combining beamforming with mask-informed speech enhancement was assessed by measuring sentence recognition in listeners with mild-to-moderate hearing impairment in adverse listening conditions that simulated the output of behind-the-ear hearing aids in a noisy classroom. Two types of beamforming were compared: binaural, with the two microphones of each aid treated as a single array, and bilateral, where independent left and right beamformers were derived. Binaural beamforming produces a narrower beam, maximising improvement in signal-to-noise ratio (SNR), but eliminates the spatial diversity that is preserved in bilateral beamforming. Each beamformer type was optimised for the true target position and implemented with and without additional speech enhancement in which spectral features extracted from the beamformer output were passed to a deep neural network trained to identify time-frequency regions dominated by target speech. Additional conditions comprising binaural beamforming combined with speech enhancement implemented using Wiener filtering or modulation-domain Kalman filtering were tested in normally-hearing (NH) listeners. Both beamformer types gave substantial improvements relative to no processing, with significantly greater benefit for binaural beamforming. Performance with additional mask-informed enhancement was poorer than with beamforming alone, for both beamformer types and both listener groups. In NH listeners the addition of mask-informed enhancement produced significantly poorer performance than both other forms of enhancement, neither of which differed from the beamformer alone. In summary, the additional improvement in SNR provided by binaural beamforming appeared to outweigh loss of spatial information, while speech understanding was not further improved by the mask-informed enhancement method implemented here.

Journal article

Sathyapriyan V, Pedersen MS, Ostergaard J, Brookes M, Naylor PA, Jensen Jet al., 2022, A LINEAR MMSE FILTER USING DELAYED REMOTE MICROPHONE SIGNALS FOR SPEECH ENHANCEMENT IN HEARING AID APPLICATIONS, 17th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE, ISSN: 2639-4316

Conference paper

Guiraud P, Moore AH, Vos RR, Naylor PA, Brookes Met al., 2022, MACHINE LEARNING FOR PARAMETER ESTIMATION IN THE MBSTOI BINAURAL INTELLIGIBILITY METRIC, 17th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE, ISSN: 2639-4316

Conference paper

Moore A, Vos R, Naylor P, Brookes Det al., 2021, Processing pipelines for efficient, physically-accurate simulation of microphone array signals in dynamic sound scenes, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, ISSN: 0736-7791

Multichannel acoustic signal processing is predicated on the fact that the inter channel relationships between the received signals can be exploited to infer information about the acoustic scene. Recently there has been increasing interest in algorithms which are applicable in dynamic scenes, where the source(s) and/or microphone array may be moving. Simulating such scenes has particular challenges which are exacerbated when real-time, listener-in-the-loop evaluation of algorithms is required. This paper considers candidate pipelines for simulating the array response to a set of point/image sources in terms of their accuracy, scalability and continuity. Anew approach, in which the filter kernels are obtained using principal component analysis from time-aligned impulse responses, is proposed. When the number of filter kernels is≤40the new approach achieves more accurate simulation than competing methods.

Conference paper

Xue W, Moore A, Brookes D, Naylor Pet al., 2020, Speech enhancement based on modulation-domain parametric multichannel Kalman filtering, IEEE Transactions on Audio, Speech and Language Processing, Vol: 29, Pages: 393-405, ISSN: 1558-7916

Recently we presented a modulation-domain multichannel Kalman filtering (MKF) algorithm for speech enhancement, which jointly exploits the inter-frame modulation-domain temporal evolution of speech and the inter-channel spatial correlation to estimate the clean speech signal. The goal of speech enhancement is to suppress noise while keeping the speech undistorted, and a key problem is to achieve the best trade-off between speech distortion and noise reduction. In this paper, we extend the MKF by presenting a modulation-domain parametric MKF (PMKF) which includes a parameter that enables flexible control of the speech enhancement behaviour in each time-frequency (TF) bin. Based on the decomposition of the MKF cost function, a new cost function for PMKF is proposed, which uses the controlling parameter to weight the noise reduction and speech distortion terms. An optimal PMKF gain is derived using a minimum mean squared error (MMSE) criterion. We analyse the performance of the proposed MKF, and show its relationship to the speech distortion weighted multichannel Wiener filter (SDW-MWF). To evaluate the impact of the controlling parameter on speech enhancement performance, we further propose PMKF speech enhancement systems in which the controlling parameter is adaptively chosen in each TF bin. Experiments on a publicly available head-related impulse response (HRIR) database in different noisy and reverberant conditions demonstrate the effectiveness of the proposed method.

Journal article

Lawson M, Brookes M, Dragotti PL, 2019, Scene estimation from a swiped image, IEEE Transactions on Computational Imaging, Vol: 5, Pages: 540-555, ISSN: 2333-9403

The image blurring that results from moving a camera with the shutter open is normally regarded as undesirable. However, the blurring of the images encapsulates information which can be extracted to recover the light rays present within the scene. Given the correct recovery of the light rays that resulted in a blurred image, it is possible to reconstruct images of the scene from different camera locations. Therefore, rather than resharpening an image with motion blur, the goal of this paper is to recover the information needed to resynthesise images of the scene from different viewpoints. Estimation of the light rays within a scene is achieved by using a layer-based model to represent objects in the scene as layers, and by using an extended level set method to segment the blurred image into planes at different depths. The algorithm described in this paper has been evaluated on real and synthetic images to produce an estimate of the underlying Epipolar Plane Image.

Journal article

Moore AH, de Haan JM, Pedersen MS, Brookes D, Naylor PA, Jensen Jet al., 2019, Personalized signal-independent beamforming for binaural hearing aids, Journal of the Acoustical Society of America, Vol: 145, Pages: 2971-2981, ISSN: 0001-4966

The effect of personalized microphone array calibration on the performance of hearing aid beamformers under noisy reverberant conditions is studied. The study makes use of a new, publicly available, database containing acoustic transfer function measurements from 29 loudspeakers arranged on a sphere to a pair of behind-the-ear hearing aids in a listening room when worn by 27 males, 14 females, and 4 mannequins. Bilateral and binaural beamformers are designed using each participant's hearing aid head-related impulse responses (HAHRIRs). The performance of these personalized beamformers is compared to that of mismatched beamformers, where the HAHRIR used for the design does not belong to the individual for whom performance is measured. The case where the mismatched HAHRIR is that of a mannequin is of particular interest since it represents current practice in commercially available hearing aids. The benefit of personalized beamforming is assessed using an intrusive binaural speech intelligibility metric and in a matrix speech intelligibility test. For binaural beamforming, both measures demonstrate a statistically signficant (p < 0.05) benefit of personalization. The benefit varies substantially between individuals with some predicted to benefit by as much as 1.5 dB.

Journal article

Dionelis N, Brookes D, 2019, Modulation-domain Kalman filtering for monaural blind speech denoising and dereverberation, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 27, Pages: 799-214, ISSN: 2329-9290

We describe a monaural speech enhancement algorithm based on modulation-domain Kalman filtering to blindly track the time-frequency log-magnitude spectra of speech and reverberation. We propose an adaptive algorithm that performs blind joint denoising and dereverberation, while accounting for the inter-frame speech dynamics, by estimating the posterior distribution of the speech log-magnitude spectrum given the log-magnitude spectrum of the noisy reverberant speech. The Kalman filter update step models the non-linear relations between the speech, noise and reverberation log-spectra. The Kalman filtering algorithm uses a signal model that takes into account the reverberation parameters of the reverberation time, T60, and the direct-to-reverberant energy ratio (DRR) and also estimates and tracks the T60 and the DRR in every frequency bin to improve the estimation of the speech log-spectrum. The proposed algorithm is evaluated in terms of speech quality, speech intelligibility and dereverberation performance for a range of reverberation parameters and reverberant speech to noise ratios, in different noises, and is also compared to competing denoising and dereverberation techniques. Experimental results using noisy reverberant speech demonstrate the effectiveness of the enhancement algorithm.

Journal article

Moore A, Xue W, Naylor P, Brookes Det al., 2019, Noise covariance matrix estimation for rotating microphone arrays, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 27, Pages: 519-530, ISSN: 2329-9290

The noise covariance matrix computed between the signals from a microphone array is used in the design of spatial filters and beamformers with applications in noise suppression and dereverberation. This paper specifically addresses the problem of estimating the covariance matrix associated with a noise field when the array is rotating during desired source activity, as is common in head-mounted arrays. We propose a parametric model that leads to an analytical expression for the microphone signal covariance as a function of the array orientation and array manifold. An algorithm for estimating the model parameters during noise-only segments is proposed and the performance shown to be improved, rather than degraded, by array rotation. The stored model parameters can then be used to update the covariance matrix to account for the effects of any array rotation that occurs when the desired source is active. The proposed method is evaluated in terms of the Frobenius norm of the error in the estimated covariance matrix and of the noise reduction performance of a minimum variance distortionless response beamformer. In simulation experiments the proposed method achieves 18 dB lower error in the estimated noise covariance matrix than a conventional recursive averaging approach and results in noise reduction which is within 0.05 dB of an oracle beamformer using the ground truth noise covariance matrix.

Journal article

Moore A, de Haan JM, Pedersen MS, Naylor P, Brookes D, Jensen Jet al., 2019, Personalized {HRTF}s for hearing aids, ELOBES2019

Cite

Conference paper

Brookes D, Lightburn L, Moore A, Naylor P, Xue Wet al., 2019, Mask-assisted speech enhancement for binaural hearing aids, ELOBES2019

Conference paper

Dionelis N, Brookes M, 2018, Speech enhancement using kalman filtering in the logarithmic bark power spectral domain, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1642-1646, ISSN: 2076-1465

We present a phase-sensitive speech enhancement algorithm based on a Kalman filter estimator that tracks speech and noise in the logarithmic Bark power spectral domain. With modulation-domain Kalman filtering, the algorithm tracks the speech spectral log-power using perceptually-motivated Bark bands. By combining STFT bins into Bark bands, the number of frequency components is reduced. The Kalman filter prediction step separately models the inter-frame relations of the speech and noise spectral log-powers and the Kalman filter update step models the nonlinear relations between the speech and noise spectral log-powers using the phase factor in Bark bands, which follows a sub-Gaussian distribution. The posterior mean of the speech spectral log-power is used to create an enhanced speech spectrum for signal reconstruction. The algorithm is evaluated in terms of speech quality and computational complexity with different algorithm configurations compared on various noise types. The algorithm implemented in Bark bands is compared to algorithms implemented in STFT bins and experimental results show that tracking speech in the log Bark power spectral domain, taking into account the temporal dynamics of each subband envelope, is beneficial. Regarding the computational complexity, the percentage decrease in the real-time factor is 44% when using Bark bands compared to when using STFT bins.

Conference paper

Xue W, Moore AH, Brookes M, Naylor PAet al., 2018, Modulation-domain parametric multichannel kalman filtering for speech enhancement, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 2509-2513, ISSN: 2076-1465

The goal of speech enhancement is to reduce the noise signal while keeping the speech signal undistorted. Recently we developed the multichannel Kalman filtering (MKF) for speech enhancement, in which the temporal evolution of the speech signal and the spatial correlation between multichannel observations are jointly exploited to estimate the clean signal. In this paper, we extend the previous work to derive a parametric MKF (PMKF), which incorporates a controlling factor to achieve the trade-off between the speech distortion and noise reduction. The controlling factor weights between the speech distortion and noise reduction related terms in the cost function of PMKF, and based on the minimum mean squared error (MMSE) criterion, the optimal PMKF gain is derived. We analyse the performance of the proposed PMKF and show the differences with the speech distortion weighted multichannel Wiener filter (SDW-MWF). We conduct experiments in different noisy conditions to evaluate the impact of the controlling factor on the noise reduction performance, and the results demonstrate the effectiveness of the proposed method.

Conference paper

Moore AH, Lightburn L, Xue W, Naylor P, Brookes Det al., 2018, Binaural mask-informed speech enhancement for hearing aids with head tracking, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEE, Pages: 461-465

An end-to-end speech enhancement system for hearing aids is pro-posed which seeks to improve the intelligibility of binaural speechin noise during head movement. The system uses a reference beam-former whose look direction is informed by knowledge of the headorientation and the a priori known direction of the desired source.From this a time-frequency mask is estimated using a deep neuralnetwork. The binaural signals are obtained using bilateral beam-formers followed by a classical minimum mean square error speechenhancer, modified to use the estimated mask as a speech presenceprobability prior. In simulated experiments, the improvement in abinaural intelligibility metric (DBSTOI) given by the proposed sys-tem relative to beamforming alone corresponds to an SNR improve-ment of 4 to 6 dB. Results also demonstrate the individual contribu-tions of incorporating the mask and the head orientation-aware beamsteering to the proposed system.

Conference paper

Moore AH, Xue W, Naylor PA, Brookes Met al., 2018, Estimation of the Noise Covariance Matrix for Rotating Sensor Arrays, 52nd Asilomar Conference on Signals, Systems, and Computers, Publisher: IEEE, Pages: 1936-1941, ISSN: 1058-6393

Conference paper

Xue W, Moore A, Brookes DM, Naylor Pet al., 2018, Modulation-domain multichannel Kalman filtering for speech enhancement, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 1833-1847, ISSN: 2329-9290

Compared with single-channel speech enhancement methods, multichannel methods can utilize spatial information to design optimal filters. Although some filters adaptively consider second-order signal statistics, the temporal evolution of the speech spectrum is usually neglected. By using linear prediction (LP) to model the inter-frame temporal evolution of speech, single-channel Kalman filtering (KF) based methods have been developed for speech enhancement. In this paper, we derive a multichannel KF (MKF) that jointly uses both interchannel spatial correlation and interframe temporal correlation for speech enhancement. We perform LP in the modulation domain, and by incorporating the spatial information, derive an optimal MKF gain in the short-time Fourier transform domain. We show that the proposed MKF reduces to the conventional multichannel Wiener filter if the LP information is discarded. Furthermore, we show that, under an appropriate assumption, the MKF is equivalent to a concatenation of the minimum variance distortion response beamformer and a single-channel modulation-domain KF and therefore present an alternative implementation of the MKF. Experiments conducted on a public head-related impulse response database demonstrate the effectiveness of the proposed method.

Journal article

Xue W, Moore A, Brookes DM, Naylor Pet al., 2018, Multichannel kalman filtering for speech ehnancement, IEEE Intl Conf on Acoustics, Speech and Signal Processing, Publisher: IEEE, ISSN: 2379-190X

The use of spatial information in multichannel speech enhancement methods is well established but information associated with the temporal evolution of speech is less commonly exploited. Speech signals can be modelled using an autoregressive process in the time-frequency modulation domain, and Kalman filtering based speech enhancement algorithms have been developed for single-channel processing. In this paper, a multichannel Kalman filter (MKF) for speech enhancement is derived that jointly considers the multichannel spatial information and the temporal correlations of speech. We model the temporal evolution of speech in the modulation domain and, by incorporating the spatial information, an optimal MKF gain is derived in the short-time Fourier transform domain. We also show that the proposed MKF becomes a conventional multichannel Wiener filter if the temporal information is discarded. Experiments using the signals generated from a public head-related impulse response database demonstrate the effectiveness of the proposed method in comparison to other techniques.

Conference paper

Moore AH, Naylor P, Brookes DM, 2018, Room identification using frequency dependence of spectral decay statistics, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers Inc., Pages: 6902-6906, ISSN: 0736-7791

A method for room identification is proposed based on the reverberation properties of multichannel speech recordings. The approach exploits the dependence of spectral decay statistics on the reverberation time of a room. The average negative-side variance within 1/3-octave bands is proposed as the identifying feature and shown to be effective in a classification experiment. However, negative-side variance is also dependent on the direct-to-reverberant energy ratio. The resulting sensitivity to different spatial configurations of source and microphones within a room are mitigated using a novel reverberation enhancement algorithm. A classification experiment using speech convolved with measured impulse responses and contaminated with environmental noise demonstrates the effectiveness of the proposed method, achieving 79% correct identification in the most demanding condition compared to 40% using unenhanced signals.

Conference paper

Dionelis N, Brookes DM, 2018, Phase-aware single-channel speech enhancement with modulation-domain Kalman filtering, IEEE Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 937-950, ISSN: 1558-7916

We present a speech enhancement algorithm that performs modulation-domain Kalman filtering to track the speech phase using circular statistics, along with the log-spectra of speech and noise. In the proposed algorithm, the speech phase posterior is used to create an enhanced speech phase spectrum for the signal reconstruction of speech. The Kalman filter prediction step separately models the temporal inter-frame correlation of the speech and noise spectral log-amplitudes and of the speech phase, while the Kalman filter update step models their nonlinear relations under the assumption that speech and noise add in the complex short-time Fourier transform domain. The phase-sensitive enhancement algorithm is evaluated with speech quality and intelligibility metrics, using a variety of noise types over a range of SNRs. Instrumental measures predict that tracking the speech log-spectrum and phase with modulation-domain Kalman filtering leads to consistent improvements in speech quality, over both conventional enhancement algorithms and other algorithms that perform modulation-domain Kalman filtering.

Journal article

Koulouri A, Rimpilaeinen V, Brookes M, Kaipio JPet al., 2018, Prior Variances and Depth Un-Biased Estimators in EEG Focal Source Imaging, Joint Conference of the European Medical and Biological Engineering Conference (EMBEC) / Nordic-Baltic Conference on Biomedical Engineering and Medical Physics (NBC), Publisher: SPRINGER-VERLAG SINGAPORE PTE LTD, Pages: 33-36, ISSN: 1680-0737

Conference paper

Wang Y, Brookes DM, 2017, Model-Based Speech Enhancement in the Modulation Domain, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 580-594, ISSN: 2329-9304

This paper presents an algorithm for modulationdomain speech enhancement using a Kalman filter. The proposed estimator jointly models the estimated dynamics of the spectral amplitudes of speech and noise to obtain an MMSE estimation of the speech amplitude spectrum with the assumption that the speech and noise are additive in the complex domain. In order to include the dynamics of noise amplitudes with those of speech amplitudes, we propose a statistical “Gaussring” model that comprises a mixture of Gaussians whose centres lie in a circle on the complex plane. The performance of the proposed algorithm is evaluated using the perceptual evaluation of speech quality (PESQ) measure, segmental SNR (segSNR) measure and shorttime objective intelligibility (STOI) measure. For speech quality measures, the proposed algorithm is shown to give a consistent improvement over a wide range of SNRs when compared to competitive algorithms. Speech recognition experiments also show that the Gaussring model based algorithm performs well for two types of noise.

Journal article

Mr Mike Brookes

Contact

Assistant

Location

Summary