Imperial College London

Mr Mike Brookes

Faculty of EngineeringDepartment of Electrical and Electronic Engineering

Senior Research Investigator
 
 
 
//

Contact

 

+44 (0)20 7594 6165mike.brookes Website

 
 
//

Assistant

 

Miss Vanessa Rodriguez-Gonzalez +44 (0)20 7594 6267

 
//

Location

 

807aElectrical EngineeringSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

129 results found

Xue W, Moore A, Brookes D, Naylor Pet al., 2020, Speech enhancement based on modulation-domain parametric multichannel Kalman filtering, IEEE Transactions on Audio, Speech and Language Processing, Vol: 29, Pages: 393-405, ISSN: 1558-7916

Recently we presented a modulation-domain multichannel Kalman filtering (MKF) algorithm for speech enhancement, which jointly exploits the inter-frame modulation-domain temporal evolution of speech and the inter-channel spatial correlation to estimate the clean speech signal. The goal of speech enhancement is to suppress noise while keeping the speech undistorted, and a key problem is to achieve the best trade-off between speech distortion and noise reduction. In this paper, we extend the MKF by presenting a modulation-domain parametric MKF (PMKF) which includes a parameter that enables flexible control of the speech enhancement behaviour in each time-frequency (TF) bin. Based on the decomposition of the MKF cost function, a new cost function for PMKF is proposed, which uses the controlling parameter to weight the noise reduction and speech distortion terms. An optimal PMKF gain is derived using a minimum mean squared error (MMSE) criterion. We analyse the performance of the proposed MKF, and show its relationship to the speech distortion weighted multichannel Wiener filter (SDW-MWF). To evaluate the impact of the controlling parameter on speech enhancement performance, we further propose PMKF speech enhancement systems in which the controlling parameter is adaptively chosen in each TF bin. Experiments on a publicly available head-related impulse response (HRIR) database in different noisy and reverberant conditions demonstrate the effectiveness of the proposed method.

Journal article

Lawson M, Brookes M, Dragotti PL, 2019, Scene estimation from a swiped image, IEEE Transactions on Computational Imaging, Vol: 5, Pages: 540-555, ISSN: 2333-9403

The image blurring that results from moving a camera with the shutter open is normally regarded as undesirable. However, the blurring of the images encapsulates information which can be extracted to recover the light rays present within the scene. Given the correct recovery of the light rays that resulted in a blurred image, it is possible to reconstruct images of the scene from different camera locations. Therefore, rather than resharpening an image with motion blur, the goal of this paper is to recover the information needed to resynthesise images of the scene from different viewpoints. Estimation of the light rays within a scene is achieved by using a layer-based model to represent objects in the scene as layers, and by using an extended level set method to segment the blurred image into planes at different depths. The algorithm described in this paper has been evaluated on real and synthetic images to produce an estimate of the underlying Epipolar Plane Image.

Journal article

Moore AH, de Haan JM, Pedersen MS, Brookes D, Naylor PA, Jensen Jet al., 2019, Personalized signal-independent beamforming for binaural hearing aids, Journal of the Acoustical Society of America, Vol: 145, Pages: 2971-2981, ISSN: 0001-4966

The effect of personalized microphone array calibration on the performance of hearing aid beamformers under noisy reverberant conditions is studied. The study makes use of a new, publicly available, database containing acoustic transfer function measurements from 29 loudspeakers arranged on a sphere to a pair of behind-the-ear hearing aids in a listening room when worn by 27 males, 14 females, and 4 mannequins. Bilateral and binaural beamformers are designed using each participant's hearing aid head-related impulse responses (HAHRIRs). The performance of these personalized beamformers is compared to that of mismatched beamformers, where the HAHRIR used for the design does not belong to the individual for whom performance is measured. The case where the mismatched HAHRIR is that of a mannequin is of particular interest since it represents current practice in commercially available hearing aids. The benefit of personalized beamforming is assessed using an intrusive binaural speech intelligibility metric and in a matrix speech intelligibility test. For binaural beamforming, both measures demonstrate a statistically signficant (p < 0.05) benefit of personalization. The benefit varies substantially between individuals with some predicted to benefit by as much as 1.5 dB.

Journal article

Dionelis N, Brookes D, 2019, Modulation-domain Kalman filtering for monaural blind speech denoising and dereverberation, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 27, Pages: 799-214, ISSN: 2329-9290

We describe a monaural speech enhancement algorithm based on modulation-domain Kalman filtering to blindly track the time-frequency log-magnitude spectra of speech and reverberation. We propose an adaptive algorithm that performs blind joint denoising and dereverberation, while accounting for the inter-frame speech dynamics, by estimating the posterior distribution of the speech log-magnitude spectrum given the log-magnitude spectrum of the noisy reverberant speech. The Kalman filter update step models the non-linear relations between the speech, noise and reverberation log-spectra. The Kalman filtering algorithm uses a signal model that takes into account the reverberation parameters of the reverberation time, T60, and the direct-to-reverberant energy ratio (DRR) and also estimates and tracks the T60 and the DRR in every frequency bin to improve the estimation of the speech log-spectrum. The proposed algorithm is evaluated in terms of speech quality, speech intelligibility and dereverberation performance for a range of reverberation parameters and reverberant speech to noise ratios, in different noises, and is also compared to competing denoising and dereverberation techniques. Experimental results using noisy reverberant speech demonstrate the effectiveness of the enhancement algorithm.

Journal article

Moore A, Xue W, Naylor P, Brookes Det al., 2019, Noise covariance matrix estimation for rotating microphone arrays, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 27, Pages: 519-530, ISSN: 2329-9290

The noise covariance matrix computed between the signals from a microphone array is used in the design of spatial filters and beamformers with applications in noise suppression and dereverberation. This paper specifically addresses the problem of estimating the covariance matrix associated with a noise field when the array is rotating during desired source activity, as is common in head-mounted arrays. We propose a parametric model that leads to an analytical expression for the microphone signal covariance as a function of the array orientation and array manifold. An algorithm for estimating the model parameters during noise-only segments is proposed and the performance shown to be improved, rather than degraded, by array rotation. The stored model parameters can then be used to update the covariance matrix to account for the effects of any array rotation that occurs when the desired source is active. The proposed method is evaluated in terms of the Frobenius norm of the error in the estimated covariance matrix and of the noise reduction performance of a minimum variance distortionless response beamformer. In simulation experiments the proposed method achieves 18 dB lower error in the estimated noise covariance matrix than a conventional recursive averaging approach and results in noise reduction which is within 0.05 dB of an oracle beamformer using the ground truth noise covariance matrix.

Journal article

Brookes D, Lightburn L, Moore A, Naylor P, Xue Wet al., 2019, Mask-assisted speech enhancement for binaural hearing aids, ELOBES2019

Conference paper

Moore A, de Haan JM, Pedersen MS, Naylor P, Brookes D, Jensen Jet al., 2019, Personalized {HRTF}s for hearing aids, ELOBES2019

Conference paper

Dionelis N, Brookes M, 2018, Speech enhancement using kalman filtering in the logarithmic bark power spectral domain, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1642-1646, ISSN: 2076-1465

We present a phase-sensitive speech enhancement algorithm based on a Kalman filter estimator that tracks speech and noise in the logarithmic Bark power spectral domain. With modulation-domain Kalman filtering, the algorithm tracks the speech spectral log-power using perceptually-motivated Bark bands. By combining STFT bins into Bark bands, the number of frequency components is reduced. The Kalman filter prediction step separately models the inter-frame relations of the speech and noise spectral log-powers and the Kalman filter update step models the nonlinear relations between the speech and noise spectral log-powers using the phase factor in Bark bands, which follows a sub-Gaussian distribution. The posterior mean of the speech spectral log-power is used to create an enhanced speech spectrum for signal reconstruction. The algorithm is evaluated in terms of speech quality and computational complexity with different algorithm configurations compared on various noise types. The algorithm implemented in Bark bands is compared to algorithms implemented in STFT bins and experimental results show that tracking speech in the log Bark power spectral domain, taking into account the temporal dynamics of each subband envelope, is beneficial. Regarding the computational complexity, the percentage decrease in the real-time factor is 44% when using Bark bands compared to when using STFT bins.

Conference paper

Xue W, Moore AH, Brookes M, Naylor PAet al., 2018, Modulation-domain parametric multichannel kalman filtering for speech enhancement, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 2509-2513, ISSN: 2076-1465

The goal of speech enhancement is to reduce the noise signal while keeping the speech signal undistorted. Recently we developed the multichannel Kalman filtering (MKF) for speech enhancement, in which the temporal evolution of the speech signal and the spatial correlation between multichannel observations are jointly exploited to estimate the clean signal. In this paper, we extend the previous work to derive a parametric MKF (PMKF), which incorporates a controlling factor to achieve the trade-off between the speech distortion and noise reduction. The controlling factor weights between the speech distortion and noise reduction related terms in the cost function of PMKF, and based on the minimum mean squared error (MMSE) criterion, the optimal PMKF gain is derived. We analyse the performance of the proposed PMKF and show the differences with the speech distortion weighted multichannel Wiener filter (SDW-MWF). We conduct experiments in different noisy conditions to evaluate the impact of the controlling factor on the noise reduction performance, and the results demonstrate the effectiveness of the proposed method.

Conference paper

Moore AH, Lightburn L, Xue W, Naylor P, Brookes Det al., 2018, Binaural mask-informed speech enhancement for hearing aids with head tracking, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEE, Pages: 461-465

An end-to-end speech enhancement system for hearing aids is pro-posed which seeks to improve the intelligibility of binaural speechin noise during head movement. The system uses a reference beam-former whose look direction is informed by knowledge of the headorientation and the a priori known direction of the desired source.From this a time-frequency mask is estimated using a deep neuralnetwork. The binaural signals are obtained using bilateral beam-formers followed by a classical minimum mean square error speechenhancer, modified to use the estimated mask as a speech presenceprobability prior. In simulated experiments, the improvement in abinaural intelligibility metric (DBSTOI) given by the proposed sys-tem relative to beamforming alone corresponds to an SNR improve-ment of 4 to 6 dB. Results also demonstrate the individual contribu-tions of incorporating the mask and the head orientation-aware beamsteering to the proposed system.

Conference paper

Moore AH, Xue W, Naylor PA, Brookes Met al., 2018, Estimation of the Noise Covariance Matrix for Rotating Sensor Arrays, 52nd Asilomar Conference on Signals, Systems, and Computers, Publisher: IEEE, Pages: 1936-1941, ISSN: 1058-6393

Conference paper

Xue W, Moore A, Brookes DM, Naylor Pet al., 2018, Modulation-domain multichannel Kalman filtering for speech enhancement, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 1833-1847, ISSN: 2329-9290

Compared with single-channel speech enhancement methods, multichannel methods can utilize spatial information to design optimal filters. Although some filters adaptively consider second-order signal statistics, the temporal evolution of the speech spectrum is usually neglected. By using linear prediction (LP) to model the inter-frame temporal evolution of speech, single-channel Kalman filtering (KF) based methods have been developed for speech enhancement. In this paper, we derive a multichannel KF (MKF) that jointly uses both interchannel spatial correlation and interframe temporal correlation for speech enhancement. We perform LP in the modulation domain, and by incorporating the spatial information, derive an optimal MKF gain in the short-time Fourier transform domain. We show that the proposed MKF reduces to the conventional multichannel Wiener filter if the LP information is discarded. Furthermore, we show that, under an appropriate assumption, the MKF is equivalent to a concatenation of the minimum variance distortion response beamformer and a single-channel modulation-domain KF and therefore present an alternative implementation of the MKF. Experiments conducted on a public head-related impulse response database demonstrate the effectiveness of the proposed method.

Journal article

Moore AH, Naylor P, Brookes DM, 2018, Room identification using frequency dependence of spectral decay statistics, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers Inc., Pages: 6902-6906, ISSN: 0736-7791

A method for room identification is proposed based on the reverberation properties of multichannel speech recordings. The approach exploits the dependence of spectral decay statistics on the reverberation time of a room. The average negative-side variance within 1/3-octave bands is proposed as the identifying feature and shown to be effective in a classification experiment. However, negative-side variance is also dependent on the direct-to-reverberant energy ratio. The resulting sensitivity to different spatial configurations of source and microphones within a room are mitigated using a novel reverberation enhancement algorithm. A classification experiment using speech convolved with measured impulse responses and contaminated with environmental noise demonstrates the effectiveness of the proposed method, achieving 79% correct identification in the most demanding condition compared to 40% using unenhanced signals.

Conference paper

Xue W, Moore A, Brookes DM, Naylor Pet al., 2018, Multichannel kalman filtering for speech ehnancement, IEEE Intl Conf on Acoustics, Speech and Signal Processing, Publisher: IEEE, ISSN: 2379-190X

The use of spatial information in multichannel speech enhancement methods is well established but information associated with the temporal evolution of speech is less commonly exploited. Speech signals can be modelled using an autoregressive process in the time-frequency modulation domain, and Kalman filtering based speech enhancement algorithms have been developed for single-channel processing. In this paper, a multichannel Kalman filter (MKF) for speech enhancement is derived that jointly considers the multichannel spatial information and the temporal correlations of speech. We model the temporal evolution of speech in the modulation domain and, by incorporating the spatial information, an optimal MKF gain is derived in the short-time Fourier transform domain. We also show that the proposed MKF becomes a conventional multichannel Wiener filter if the temporal information is discarded. Experiments using the signals generated from a public head-related impulse response database demonstrate the effectiveness of the proposed method in comparison to other techniques.

Conference paper

Dionelis N, Brookes DM, 2018, Phase-aware single-channel speech enhancement with modulation-domain Kalman filtering, IEEE Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 937-950, ISSN: 1558-7916

We present a speech enhancement algorithm that performs modulation-domain Kalman filtering to track the speech phase using circular statistics, along with the log-spectra of speech and noise. In the proposed algorithm, the speech phase posterior is used to create an enhanced speech phase spectrum for the signal reconstruction of speech. The Kalman filter prediction step separately models the temporal inter-frame correlation of the speech and noise spectral log-amplitudes and of the speech phase, while the Kalman filter update step models their nonlinear relations under the assumption that speech and noise add in the complex short-time Fourier transform domain. The phase-sensitive enhancement algorithm is evaluated with speech quality and intelligibility metrics, using a variety of noise types over a range of SNRs. Instrumental measures predict that tracking the speech log-spectrum and phase with modulation-domain Kalman filtering leads to consistent improvements in speech quality, over both conventional enhancement algorithms and other algorithms that perform modulation-domain Kalman filtering.

Journal article

Koulouri A, Rimpilaeinen V, Brookes M, Kaipio JPet al., 2018, Prior Variances and Depth Un-Biased Estimators in EEG Focal Source Imaging, Joint Conference of the European Medical and Biological Engineering Conference (EMBEC) / Nordic-Baltic Conference on Biomedical Engineering and Medical Physics (NBC), Publisher: SPRINGER-VERLAG SINGAPORE PTE LTD, Pages: 33-36, ISSN: 1680-0737

Conference paper

Wang Y, Brookes DM, 2017, Model-Based Speech Enhancement in the Modulation Domain, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 580-594, ISSN: 2329-9304

This paper presents an algorithm for modulationdomain speech enhancement using a Kalman filter. The proposed estimator jointly models the estimated dynamics of the spectral amplitudes of speech and noise to obtain an MMSE estimation of the speech amplitude spectrum with the assumption that the speech and noise are additive in the complex domain. In order to include the dynamics of noise amplitudes with those of speech amplitudes, we propose a statistical “Gaussring” model that comprises a mixture of Gaussians whose centres lie in a circle on the complex plane. The performance of the proposed algorithm is evaluated using the perceptual evaluation of speech quality (PESQ) measure, segmental SNR (segSNR) measure and shorttime objective intelligibility (STOI) measure. For speech quality measures, the proposed algorithm is shown to give a consistent improvement over a wide range of SNRs when compared to competitive algorithms. Speech recognition experiments also show that the Gaussring model based algorithm performs well for two types of noise.

Journal article

De Sena E, Brookes DM, Naylor PA, van Waterschoot Tet al., 2017, Localization Experiments with Reporting by Head Orientation: Statistical Framework and Case Study, Journal of the Audio Engineering Society, Vol: 65, Pages: 982-996, ISSN: 0004-7554

This research focuses on sound localization experiments in which subjects report the position of an active sound source by turning toward it. A statistical framework for the analysis of the data is presented together with a case study from a large-scale listening experiment. The statistical framework is based on a model that is robust to the presence of front/back confusions and random errors. Closed-form natural estimators are derived, and one-sample and two-sample statistical tests are described. The framework is used to analyze the data of an auralized experiment undertaken by nearly nine hundred subjects. The objective was to explore localization performance in the horizontal plane in an informal setting and with little training, which are conditions that are similar to those typically encountered in consumer applications of binaural audio. Results show that responses had a rightward bias and that speech was harder to localize than percussion sounds, which are results consistent with the literature. Results also show that it was harder to localize sound in a simulated room with a high ceiling despite having a higher direct-to-reverberant ratio than other simulated rooms.

Journal article

Dionelis N, Brookes, 2017, Speech Enhancement Using Modulation-Domain Kalman Filtering with Active Speech Level Normalized Log-Spectrum Global Priors, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465

We describe a single-channel speech enhancement algorithm that is based on modulation-domain Kalman filtering that tracks the inter-frame time evolution of the speech logpower spectrum in combination with the long-term average speech log-spectrum. We use offline-trained log-power spectrum global priors incorporated in the Kalman filter prediction and update steps for enhancing noise suppression. In particular, we train and utilize Gaussian mixture model priors for speech in the log-spectral domain that are normalized with respect to the active speech level. The Kalman filter update step uses the log-power spectrum global priors together with the local priors obtained from the Kalman filter prediction step. The logspectrum Kalman filtering algorithm, which uses the theoretical phase factor distribution and improves the modeling of the modulation features, is evaluated in terms of speech quality. Different algorithm configurations, dependent on whether global priors and/or Kalman filter noise tracking are used, are compared in various noise types.

Conference paper

Moore AH, Brookes D, Naylor PA, 2017, Robust spherical harmonic domain interpolation of spatially sampled array manifolds, IEEE International Conference on Acoustics Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 521-525, ISSN: 1520-6149

Accurate interpolation of the array manifold is an important firststep for the acoustic simulation of rapidly moving microphone ar-rays. Spherical harmonic domain interpolation has been proposedand well studied in the context of head-related transfer functions buthas focussed on perceptual, rather than numerical, accuracy. In thispaper we analyze the effect of measurement noise on spatial aliasing.Based on this analysis we propose a method for selecting the trunca-tion orders for the forward and reverse spherical Fourier transformsgiven only the noisy samples in such a way that the interpolationerror is minimized. The proposed method achieves up to 1.7 dB im-provement over the baseline approach.

Conference paper

Lightburn L, De Sena E, Moore AH, Naylor PA, Brookes Det al., 2017, Improving the perceptual quality of ideal binary masked speech, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 661-665, ISSN: 1520-6149

It is known that applying a time-frequency binary mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a binary mask that combines the intelligibility gains of conventional binary masking with the perceptual quality gains of a classical speech enhancer. The binary mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the mask is used to supply prior information to a classical speech enhancer about the probability of speech presence in different time-frequency regions. Using an oracle ideal binary mask, we show that the proposed method results in a higher predicted quality than other methods of applying a binary mask whilst preserving the improvements in predicted intelligibility.

Conference paper

Xue W, Brookes M, Naylor PA, 2017, Frequency-domain under-modelled blind system identification based on cross power spectrum and sparsity regularization, IEEE International Conference on Acoustics, Speech and Signal Processing, Pages: 591-595, ISSN: 1520-6149

© 2017 IEEE. In room acoustics, under-modelled multichannel blind system identification (BSI) aims to estimate the early part of the room impulse responses (RIRs), and it can be widely used in applications such as speaker localization, room geometry identification and beamforming based speech dereverberation. In this paper we extend our recent study on under-modelled BSI from the time domain to the frequency domain, such that the RIRs can be updated frame-wise and the efficiency of Fast Fourier Transform (FFT) is exploited to reduce the computational complexity. Analogous to the cross-correlation based criterion in the time domain, a frequency-domain cross power spectrum based criterion is proposed. As the early RIRs are usually sparse, the RIRs are estimated by jointly maximizing the cross power spectrum based criterion in the frequency domain and minimizing the l 1 -norm sparsity measure in the time domain. A two-stage LMS updating algorithm is derived to achieve joint optimization of these two targets. The experimental results in different under-modelled scenarios demonstrate the effectiveness of the proposed method.

Conference paper

Dionelis N, Brookes M, 2017, Modulation-domain speech enhancement using a kalman filter with a bayesian update of speech and noise in the log-spectral domain, IEEE Conference on on Hands-free Speech Communication and Microphone Arrays, Publisher: IEEE

We present a Bayesian estimator that performs log-spectrum esti-mation of both speech and noise, and is used as a Bayesian Kalmanfilter update step for single-channel speech enhancement in the mod-ulation domain. We use Kalman filtering in the log-power spectraldomain rather than in the amplitude or power spectral domains. Inthe Bayesian Kalman filter update step, we define the posterior dis-tribution of the clean speech and noise log-power spectra as a two-dimensional multivariate Gaussian distribution. We utilize a Kalmanfilter observation constraint surface in the three-dimensional space,where the third dimension is the phase factor. We evaluate the re-sults of the phase-sensitive log-spectrum Kalman filter by comparingthem with the results obtained by traditional noise suppression tech-niques and by an alternative Kalman filtering technique that assumesadditivity of speech and noise in the power spectral domain.

Conference paper

Doire CSJ, Brookes DM, Naylor PA, 2017, Robust and efficient Bayesian adaptive psychometric function estimation, Journal of the Acoustical Society of America, Vol: 141, Pages: 2501-2512, ISSN: 0001-4966

The efficient measurement of the threshold and slope of the psychometric function (PF) is an important objective in psychoacoustics. This paper proposes a procedure that combines a Bayesian estimate of the PF with either a look one-ahead or a look two-ahead method of selecting the next stimulus presentation. The procedure differs from previously proposed algorithms in two respects: (i) it does not require the range of possible PF parameters to be specified in advance and (ii) the sequence of probe signal-to-noise ratios optimizes the threshold and slope estimates at a performance level, ϕ, that can be chosen by the experimenter. Simulation results show that the proposed procedure is robust and that the estimates of both threshold and slope have a consistently low bias. Over a wide range of listener PF parameters, the root-mean-square errors after 50 trials were ∼1.2 dB in threshold and 0.14 in log-slope. It was found that the performance differences between the look one-ahead and look two-ahead methods were negligible and that an entropy-based criterion for selecting the next stimulus was preferred to a variance-based criterion.

Journal article

Doire CSJ, Brookes DM, Naylor PA, Hicks CM, Betts D, Dmour MA, Jensen SHet al., 2017, Single-channel online enhancement of speech corrupted by reverberation and noise, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 25, Pages: 572-587, ISSN: 2329-9290

This paper proposes an online single-channel speech enhancement method designed to improve the quality of speech degraded by reverberation and noise. Based on an autoregressive model for the reverberation power and on a hidden Markov model for clean speech production, a Bayesian filtering formulation of the problem is derived and online joint estimation of the acoustic parameters and mean speech, reverberation, and noise powers is obtained in mel-frequency bands. From these estimates, a real-valued spectral gain is derived and spectral enhancement is applied in the short-time Fourier transform (STFT) domain. The method yields state-of-the-art performance and greatly reduces the effects of reverberation and noise while improving speech quality and preserving speech intelligibility in challenging acoustic environments.

Journal article

Lawson M, Brookes M, Dragotti PL, 2017, IDENTIFYING A MULTIPLE PLANE PLENOPTIC FUNCTION FROM A SWIPED IMAGE, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 1423-1427, ISSN: 1520-6149

Conference paper

Dionelis N, Brookes M, 2016, Active speech level estimation in noisy signals with quadrature noise suppression, European Signal Processing Conference ( EUSIPCO'16), Publisher: IEEE, ISSN: 2076-1465

We present a noise-robust algorithm for estimating the active level ofspeech, which is the average speech power during intervals of speechactivity. The proposed algorithm uses the clean speech phase to removethe quadrature noise component from the short-time powerspectrum of the noisy speech, as well as SNR-dependent techniquesto improve the estimation. The pitch of voiced speech frames isdetermined using a noise-robust pitch tracker and the speech levelis estimated from the energy of the pitch harmonics using the harmonicsummation principle. At low noise levels, the resultant activespeech level estimate is combined with that from the standardizedITU-T P.56 algorithm to give a final composite estimate. The algorithmhas been evaluated using a range of noise signals and givesconsistently lower errors than previous methods and than the ITU-TP.56 algorithm, which is accurate for SNR levels of above 15 dB.

Conference paper

Xue W, Brookes DM, Naylor PA, 2016, Under-modelled blind system identification for time delay estimation in reverberant environments, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

In multichannel systems, acoustic time delay estimation (TDE) is a challenging problem in reverberant environments. Although blind system identification (BSI) based methods have been proposed which utilize a realistic signal model for the room impulse response (RIR), their TDE performance depends strongly on that of the BSI, which is often inaccurate in practice when the identified responses are under-modelled. In this paper, we propose a new under-modelled BSI based method for TDE in reverberant environments. An under-modelled BSI algorithm is derived, which is based on maximizing the cross-correlation of the cross-filtered signals rather than minimizing the cross-relation error, and also exploits the sparsity of the early part of the RIR. For TDE, this new criterion can be viewed as a generalization of conventional cross-correlation-based TDE methods by considering a more realistic model for the early RIR. Depending on the microphone spacing, only a short early part of each RIR is identified, and the time delays are estimated based on the peak locations in the identified early RIRs. Experiments in different reverberant environments with speech source signals demonstrate the effectiveness of the proposed method.

Conference paper

Koulouri A, Brookes DM, Rimpiläinen V, 2016, Vector tomography for reconstructing electric elds with non-zero divergence in bounded domains, Journal of Computational Physics, Vol: 329, Pages: 73-90, ISSN: 0021-9991

In vector tomography (VT), the aim is to reconstruct an unknown multi-dimensional vector field using line integral data. In the case of a 2-dimensional VT, two types of line integral data are usually required. These data correspond to integration of the parallel and perpendicular projection of the vector field along the integration lines and are called the longitudinal and transverse measurements, respectively. In most cases, however, the transverse measurements cannot be physically acquired. Therefore, the VT methods are typically used to reconstruct divergence-free (or source-free) velocity and flow fields that can be reconstructed solely from the longitudinal measurements. In this paper, we show how vector fields with non-zero divergence in a bounded domain can also be reconstructed from the longitudinal measurements without the need of explicitly evaluating the transverse measurements. To the best of our knowledge, VT has not previously been used for this purpose. In particular, we study low-frequency, time-harmonic electric fields generated by dipole sources in convex bounded domains which arise, for example, in electroencephalography (EEG) source imaging. We explain in detail the theoretical background, the derivation of the electric field inverse problem and the numerical approximation of the line integrals. We show that fields with non-zero divergence can be reconstructed from the longitudinal measurements with the help of two sparsity constraints that are constructed from the transverse measurements and the vector Laplace operator. As a comparison to EEG source imaging, we note that VT does not require mathematical modeling of the sources. By numerical simulations, we show that the pattern of the electric field can be correctly estimated using VT and the location of the source activity can be determined accurately from the reconstructed magnitudes of the field.

Journal article

Lawson M, Brookes M, Dragotti PL, 2016, Capturing the plenoptic function in a swipe, Conference on Applications of Digital Image Processing XXXIX, Publisher: Society of Photo-optical Instrumentation Engineers, ISSN: 0277-786X

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00000744&limit=30&person=true