Imperial College London

Mr Mike Brookes

Faculty of EngineeringDepartment of Electrical and Electronic Engineering

Emeritus Reader
 
 
 
//

Contact

 

+44 (0)20 7594 6165mike.brookes Website

 
 
//

Assistant

 

Miss Vanessa Rodriguez-Gonzalez +44 (0)20 7594 6267

 
//

Location

 

807aElectrical EngineeringSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

149 results found

De Sena E, Brookes DM, Naylor PA, van Waterschoot Tet al., 2017, Localization Experiments with Reporting by Head Orientation: Statistical Framework and Case Study, Journal of the Audio Engineering Society, Vol: 65, Pages: 982-996, ISSN: 0004-7554

This research focuses on sound localization experiments in which subjects report the position of an active sound source by turning toward it. A statistical framework for the analysis of the data is presented together with a case study from a large-scale listening experiment. The statistical framework is based on a model that is robust to the presence of front/back confusions and random errors. Closed-form natural estimators are derived, and one-sample and two-sample statistical tests are described. The framework is used to analyze the data of an auralized experiment undertaken by nearly nine hundred subjects. The objective was to explore localization performance in the horizontal plane in an informal setting and with little training, which are conditions that are similar to those typically encountered in consumer applications of binaural audio. Results show that responses had a rightward bias and that speech was harder to localize than percussion sounds, which are results consistent with the literature. Results also show that it was harder to localize sound in a simulated room with a high ceiling despite having a higher direct-to-reverberant ratio than other simulated rooms.

Journal article

Dionelis N, Brookes, 2017, Speech Enhancement Using Modulation-Domain Kalman Filtering with Active Speech Level Normalized Log-Spectrum Global Priors, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465

We describe a single-channel speech enhancement algorithm that is based on modulation-domain Kalman filtering that tracks the inter-frame time evolution of the speech logpower spectrum in combination with the long-term average speech log-spectrum. We use offline-trained log-power spectrum global priors incorporated in the Kalman filter prediction and update steps for enhancing noise suppression. In particular, we train and utilize Gaussian mixture model priors for speech in the log-spectral domain that are normalized with respect to the active speech level. The Kalman filter update step uses the log-power spectrum global priors together with the local priors obtained from the Kalman filter prediction step. The logspectrum Kalman filtering algorithm, which uses the theoretical phase factor distribution and improves the modeling of the modulation features, is evaluated in terms of speech quality. Different algorithm configurations, dependent on whether global priors and/or Kalman filter noise tracking are used, are compared in various noise types.

Conference paper

Moore AH, Brookes D, Naylor PA, 2017, Robust spherical harmonic domain interpolation of spatially sampled array manifolds, IEEE International Conference on Acoustics Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 521-525, ISSN: 1520-6149

Accurate interpolation of the array manifold is an important firststep for the acoustic simulation of rapidly moving microphone ar-rays. Spherical harmonic domain interpolation has been proposedand well studied in the context of head-related transfer functions buthas focussed on perceptual, rather than numerical, accuracy. In thispaper we analyze the effect of measurement noise on spatial aliasing.Based on this analysis we propose a method for selecting the trunca-tion orders for the forward and reverse spherical Fourier transformsgiven only the noisy samples in such a way that the interpolationerror is minimized. The proposed method achieves up to 1.7 dB im-provement over the baseline approach.

Conference paper

Lightburn L, De Sena E, Moore AH, Naylor PA, Brookes Det al., 2017, Improving the perceptual quality of ideal binary masked speech, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: Institute of Electrical and Electronics Engineers (IEEE), Pages: 661-665, ISSN: 1520-6149

It is known that applying a time-frequency binary mask to very noisy speech can improve its intelligibility but results in poor perceptual quality. In this paper we propose a new approach to applying a binary mask that combines the intelligibility gains of conventional binary masking with the perceptual quality gains of a classical speech enhancer. The binary mask is not applied directly as a time-frequency gain as in most previous studies. Instead, the mask is used to supply prior information to a classical speech enhancer about the probability of speech presence in different time-frequency regions. Using an oracle ideal binary mask, we show that the proposed method results in a higher predicted quality than other methods of applying a binary mask whilst preserving the improvements in predicted intelligibility.

Conference paper

Xue W, Brookes M, Naylor PA, 2017, Frequency-domain under-modelled blind system identification based on cross power spectrum and sparsity regularization, IEEE International Conference on Acoustics, Speech and Signal Processing, Pages: 591-595, ISSN: 1520-6149

© 2017 IEEE. In room acoustics, under-modelled multichannel blind system identification (BSI) aims to estimate the early part of the room impulse responses (RIRs), and it can be widely used in applications such as speaker localization, room geometry identification and beamforming based speech dereverberation. In this paper we extend our recent study on under-modelled BSI from the time domain to the frequency domain, such that the RIRs can be updated frame-wise and the efficiency of Fast Fourier Transform (FFT) is exploited to reduce the computational complexity. Analogous to the cross-correlation based criterion in the time domain, a frequency-domain cross power spectrum based criterion is proposed. As the early RIRs are usually sparse, the RIRs are estimated by jointly maximizing the cross power spectrum based criterion in the frequency domain and minimizing the l 1 -norm sparsity measure in the time domain. A two-stage LMS updating algorithm is derived to achieve joint optimization of these two targets. The experimental results in different under-modelled scenarios demonstrate the effectiveness of the proposed method.

Conference paper

Dionelis N, Brookes M, 2017, Modulation-domain speech enhancement using a kalman filter with a bayesian update of speech and noise in the log-spectral domain, IEEE Conference on on Hands-free Speech Communication and Microphone Arrays, Publisher: IEEE

We present a Bayesian estimator that performs log-spectrum esti-mation of both speech and noise, and is used as a Bayesian Kalmanfilter update step for single-channel speech enhancement in the mod-ulation domain. We use Kalman filtering in the log-power spectraldomain rather than in the amplitude or power spectral domains. Inthe Bayesian Kalman filter update step, we define the posterior dis-tribution of the clean speech and noise log-power spectra as a two-dimensional multivariate Gaussian distribution. We utilize a Kalmanfilter observation constraint surface in the three-dimensional space,where the third dimension is the phase factor. We evaluate the re-sults of the phase-sensitive log-spectrum Kalman filter by comparingthem with the results obtained by traditional noise suppression tech-niques and by an alternative Kalman filtering technique that assumesadditivity of speech and noise in the power spectral domain.

Conference paper

Doire CSJ, Brookes DM, Naylor PA, 2017, Robust and efficient Bayesian adaptive psychometric function estimation, Journal of the Acoustical Society of America, Vol: 141, Pages: 2501-2512, ISSN: 0001-4966

The efficient measurement of the threshold and slope of the psychometric function (PF) is an important objective in psychoacoustics. This paper proposes a procedure that combines a Bayesian estimate of the PF with either a look one-ahead or a look two-ahead method of selecting the next stimulus presentation. The procedure differs from previously proposed algorithms in two respects: (i) it does not require the range of possible PF parameters to be specified in advance and (ii) the sequence of probe signal-to-noise ratios optimizes the threshold and slope estimates at a performance level, ϕ, that can be chosen by the experimenter. Simulation results show that the proposed procedure is robust and that the estimates of both threshold and slope have a consistently low bias. Over a wide range of listener PF parameters, the root-mean-square errors after 50 trials were ∼1.2 dB in threshold and 0.14 in log-slope. It was found that the performance differences between the look one-ahead and look two-ahead methods were negligible and that an entropy-based criterion for selecting the next stimulus was preferred to a variance-based criterion.

Journal article

Doire CSJ, Brookes DM, Naylor PA, Hicks CM, Betts D, Dmour MA, Jensen SHet al., 2017, Single-channel online enhancement of speech corrupted by reverberation and noise, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 25, Pages: 572-587, ISSN: 2329-9290

This paper proposes an online single-channel speech enhancement method designed to improve the quality of speech degraded by reverberation and noise. Based on an autoregressive model for the reverberation power and on a hidden Markov model for clean speech production, a Bayesian filtering formulation of the problem is derived and online joint estimation of the acoustic parameters and mean speech, reverberation, and noise powers is obtained in mel-frequency bands. From these estimates, a real-valued spectral gain is derived and spectral enhancement is applied in the short-time Fourier transform (STFT) domain. The method yields state-of-the-art performance and greatly reduces the effects of reverberation and noise while improving speech quality and preserving speech intelligibility in challenging acoustic environments.

Journal article

Lawson M, Brookes M, Dragotti PL, 2017, IDENTIFYING A MULTIPLE PLANE PLENOPTIC FUNCTION FROM A SWIPED IMAGE, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 1423-1427, ISSN: 1520-6149

Conference paper

Dionelis N, Brookes M, 2016, Active speech level estimation in noisy signals with quadrature noise suppression, European Signal Processing Conference ( EUSIPCO'16), Publisher: IEEE, ISSN: 2076-1465

We present a noise-robust algorithm for estimating the active level ofspeech, which is the average speech power during intervals of speechactivity. The proposed algorithm uses the clean speech phase to removethe quadrature noise component from the short-time powerspectrum of the noisy speech, as well as SNR-dependent techniquesto improve the estimation. The pitch of voiced speech frames isdetermined using a noise-robust pitch tracker and the speech levelis estimated from the energy of the pitch harmonics using the harmonicsummation principle. At low noise levels, the resultant activespeech level estimate is combined with that from the standardizedITU-T P.56 algorithm to give a final composite estimate. The algorithmhas been evaluated using a range of noise signals and givesconsistently lower errors than previous methods and than the ITU-TP.56 algorithm, which is accurate for SNR levels of above 15 dB.

Conference paper

Xue W, Brookes DM, Naylor PA, 2016, Under-modelled blind system identification for time delay estimation in reverberant environments, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE

In multichannel systems, acoustic time delay estimation (TDE) is a challenging problem in reverberant environments. Although blind system identification (BSI) based methods have been proposed which utilize a realistic signal model for the room impulse response (RIR), their TDE performance depends strongly on that of the BSI, which is often inaccurate in practice when the identified responses are under-modelled. In this paper, we propose a new under-modelled BSI based method for TDE in reverberant environments. An under-modelled BSI algorithm is derived, which is based on maximizing the cross-correlation of the cross-filtered signals rather than minimizing the cross-relation error, and also exploits the sparsity of the early part of the RIR. For TDE, this new criterion can be viewed as a generalization of conventional cross-correlation-based TDE methods by considering a more realistic model for the early RIR. Depending on the microphone spacing, only a short early part of each RIR is identified, and the time delays are estimated based on the peak locations in the identified early RIRs. Experiments in different reverberant environments with speech source signals demonstrate the effectiveness of the proposed method.

Conference paper

Koulouri A, Brookes DM, Rimpiläinen V, 2016, Vector tomography for reconstructing electric elds with non-zero divergence in bounded domains, Journal of Computational Physics, Vol: 329, Pages: 73-90, ISSN: 0021-9991

In vector tomography (VT), the aim is to reconstruct an unknown multi-dimensional vector field using line integral data. In the case of a 2-dimensional VT, two types of line integral data are usually required. These data correspond to integration of the parallel and perpendicular projection of the vector field along the integration lines and are called the longitudinal and transverse measurements, respectively. In most cases, however, the transverse measurements cannot be physically acquired. Therefore, the VT methods are typically used to reconstruct divergence-free (or source-free) velocity and flow fields that can be reconstructed solely from the longitudinal measurements. In this paper, we show how vector fields with non-zero divergence in a bounded domain can also be reconstructed from the longitudinal measurements without the need of explicitly evaluating the transverse measurements. To the best of our knowledge, VT has not previously been used for this purpose. In particular, we study low-frequency, time-harmonic electric fields generated by dipole sources in convex bounded domains which arise, for example, in electroencephalography (EEG) source imaging. We explain in detail the theoretical background, the derivation of the electric field inverse problem and the numerical approximation of the line integrals. We show that fields with non-zero divergence can be reconstructed from the longitudinal measurements with the help of two sparsity constraints that are constructed from the transverse measurements and the vector Laplace operator. As a comparison to EEG source imaging, we note that VT does not require mathematical modeling of the sources. By numerical simulations, we show that the pattern of the electric field can be correctly estimated using VT and the location of the source activity can be determined accurately from the reconstructed magnitudes of the field.

Journal article

Lawson M, Brookes M, Dragotti PL, 2016, Capturing the plenoptic function in a swipe, Conference on Applications of Digital Image Processing XXXIX, Publisher: Society of Photo-optical Instrumentation Engineers, ISSN: 0277-786X

Conference paper

Xue W, Brookes M, Naylor PA, 2016, Cross-Correlation Based Under-Modelled Multichannel Blind Acoustic System Identification with Sparsity Regularization, 24th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 718-722, ISSN: 2076-1465

Conference paper

Sharma D, Naylor PA, Wang Y, Brookes DMet al., 2016, A Data-Driven Non-intrusive Measure of Speech Quality and Intelligibility, Speech Communication, Vol: 80, Pages: 84-94, ISSN: 0167-6393

Speech signals are often affected by additive noiseand distortion which can degrade the perceived quality andintelligibility of the signal. We present a new measure, NISA, forestimating the quality and intelligibility of speech degraded byadditive noise and distortions associated with telecommunicationsnetworks, based on a data driven framework of feature extractionand tree based regression. The new measure is non-intrusive,operating on the degraded signal alone without the need for areference signal. This makes the measure applicable to practicalspeech processing applications operating in the single-endedmode. The new measure has been evaluated against the intrusivemeasures PESQ and STOI. The results indicate that the accuracyof the new non-intrusive method is around 90% of the accuracy ofthe intrusive measures, depending on the test scenario. The NISAmeasure therefore provides non-intrusive (single-ended) PESQand STOI estimates with high accuracy.

Journal article

Wang Y, Brookes D, 2016, Speech Enhancement Using An {MMSE} Spectral Amplitude Estimator Based On A Modulation Domain Kalman Filter With A Gamma Prior, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 5225-5229

In this paper, we propose a minimum mean square error spectral estimator for clean speech spectral amplitudes that uses a Kalman filter to model the temporal dynamics of the spectral amplitudes in the modulation domain. Using a two-parameter Gamma distribution to model the prior distribution of the speech spectral amplitudes, we derive closed form expressions for the posterior mean and variance of the spectral amplitudes as well as for the associated update step of the Kalman filter. The performance of the proposed algorithm is evaluated on the TIMIT core test set using the perceptual evaluation of speech quality (PESQ) measure and segmental SNR measure and is shown to give a consistent improvement over a wide range of SNRs when compared to competitive algorithms.

Conference paper

Lightburn L, Brookes D, 2016, A Weighted STOI Intelligibility Metric Based On Mutual Information, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 5365-5369

It is known that the information required for the intelligibility of a speech signal is distributed non-uniformly in time. In this paper we propose WSTOI, a modified version of STOI, a speech intelligibility metric. With WSTOI the contribution of each time-frequency cell is weighted by an estimate of its intelligibility content. This estimate is equal to the mutual information between two hypothetical signals at either end of a simplified model of human communication. Listening tests show that the modification improves the prediction accuracy of STOI at all performance levels on both long and short utterances. An improvement was observed across all tested noise types and suppression algorithms.

Conference paper

Koulouri A, Rimpiläinen V, Brookes M, Kaipio JPet al., 2016, Compensation of domain modelling errors in the inverse source problem of the Poisson equation: Application in electroencephalographic imaging, Applied Numerical Mathematics, Vol: 106, Pages: 24-36, ISSN: 1873-5460

Journal article

Doire CSJ, Brookes DM, Naylor PA, De Sena E, van Waterschoot T, Jensen SHJet al., 2016, Acoustic Environment Control: Implementation of a Reverberation Enhancement System, AES 60th International Conference: DREAMS (Dereverberation and Reverberation of Audio, Music, and Speech)

Reverberation enhancement systems allow the active control of the acoustic environment. They are subject to instability issues due to acoustic feedback, and are often installed permanently in large halls, sometimes at great cost. In this paper, we explore the possibility of implementing a cost-effective reverberation enhancement system to control the acoustics of typical rooms using a combination of spatial filtering, automatic calibration, adaptive notch filters, howling detection and manual adjustments. The effectiveness of the system is then tested inside a small soundproof booth.

Conference paper

Hu M, Sharma D, Doclo S, Brookes M, Naylor PAet al., 2016, Blind adaptive SIMO acoustic system identification using a locally optimal step-size, 60th AES International Conference on Dereverberation and Reverberation of Audio, Music, and Speech (DREAMS), Publisher: AUDIO ENGINEERING SOC INC

Conference paper

Doire C, Brookes D, Naylor P, Jensen SHet al., 2015, Data-Driven Statistical Modelling of Room Impulse Responses in the Power Domain, European Signal Processing Conference (EUSIPCO), Publisher: IEEE

Having an accurate statistical model of room impulse responses with a minimum number of parameters is of crucial importance in applications such as dereverberation. In this paper, by taking into account the behaviour of the early reflections, we extend the widely-used statistical model proposed by Polack. The squared room impulse response is modelled in each frequency band as the realisation of a stochastic process weighted by the sum of two exponential decays. Room-independent values for the new parameters involved are obtained through analysis of several room impulse response databases, and validation of the model in the likelihood sense is performed.

Conference paper

Hu M, Doclo S, Sharma D, Brookes D, Naylor Pet al., 2015, Noise Robust Blind System Identification Algorithms Based On A Rayleigh Quotient Cost Function, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 2476-2480

An important prerequisite for acoustic multi-channel equalization for speech dereverberation involves the identification of the acoustic channels between the source and the microphones. Blind System Identification (BSI) algorithms based on cross-relation error minimization are known to mis-converge in the presence of noise. Although algorithms have been proposed in the literature to improve robustness to noise, the estimated room impulse responses are usually constrained to have a flat magnitude spectrum. In this paper, noise robust algorithms based on a Rayleigh quotient cost function are proposed. Unlike the traditional algorithms, the estimated impulse responses are not always forced to have unit norm. Experimental results using simulated room impulse responses and several SNRs show that one of the proposed algorithms outperforms competing algorithms in terms of normalized projection misalignment.

Conference paper

Lightburn L, Brookes M, 2015, SOBM - a binary mask for noisy speech that optimises an objective intelligibility metric, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 5078-5082, ISSN: 1520-6149

It is known that the intelligibility of noisy speech can beimproved by applying a binary-valued gain mask to a timefrequencyrepresentation of the speech. We present theSOBM, an oracle binary mask that maximises STOI, anobjective speech intelligibility metric. We show how to determinethe SOBM for a deterministic noise signal and alsofor a stochastic noise signal with a known power spectrum.We demonstrate that applying the SOBM to noisy speech resultsin a higher predicted intelligibility than is obtained withother masks and show that the stochastic version is robust tomismatch errors in SNR and noise spectrum.

Conference paper

Hu M, sharma D, Doclo S, Brookes D, naylor Pet al., 2015, Speaker change detection and speaker diarization using spatial information, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Conference paper

Doire CSJ, Brookes M, Naylor PA, Betts D, Hicks CM, Dmour MA, Jensen SHet al., 2015, SINGLE-CHANNEL BLIND ESTIMATION OF REVERBERATION PARAMETERS, 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 31-35, ISSN: 1520-6149

Conference paper

Lightburn L, Brookes M, 2015, SOBM - A BINARY MASK FOR NOISY SPEECH THAT OPTIMISES AN OBJECTIVE INTELLIGIBILITY METRIC, 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 5078-5082, ISSN: 1520-6149

Conference paper

Doire CSJ, Brookes M, Naylor PA, Betts D, Hicks CM, Dmour MA, Jensen SHet al., 2015, SINGLE-CHANNEL BLIND ESTIMATION OF REVERBERATION PARAMETERS, 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 31-35, ISSN: 1520-6149

Conference paper

Hu M, Sharma D, Doclo S, Brookes M, Naylor PAet al., 2015, SPEAKER CHANGE DETECTION AND SPEAKER DIARIZATION USING SPATIAL INFORMATION, 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 5743-5747, ISSN: 1520-6149

Conference paper

Hu M, Parada PP, Sharma D, Doclo S, van Waterschoot T, Brookes M, Naylor PAet al., 2015, SINGLE-CHANNEL SPEAKER DIARIZATION BASED ON SPATIAL FEATURES, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE, ISSN: 1931-1168

Conference paper

Stanton R, Gaubitch N, Naylor P, Brookes DMet al., 2014, A Differentiable Approximation to Speech Intelligibility Index with Applications to Listening Enhancement, AES Intl Conf on Audio Forensics

The Speech Intelligibility Index is a standardised objective measure for estimating the intelligibility of speech in noise. It is, however difficult to use it in the iterative optimisation of speech enhancement algorithms because it is a discontinuous function of its input parameters. In this paper, we derive an approximation for the Speech Intelligibility Index that is both continuous and differentiable, which allows for more efficient optimisation procedures. The use of the approximation is demonstrated in an application to near-end speech enhancement.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: limit=30&id=00000744&person=true&page=2&respub-action=search.html