Publications
149 results found
Gonzalez S, Brookes M, 2014, PEFAC - a pitch estimation algorithm robust to high levels of noise, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 22, Pages: 518-530, ISSN: 2329-9290
We present PEFAC, a fundamental frequency estimation algorithm for speech that is able to identify voiced frames and estimate pitch reliably even at negative signal-to-noise ratios. The algorithm combines a normalization stage, to remove channel dependency and to attenuate strong noise components, with a harmonic summing filter applied in the log-frequency power spectral domain, the impulse response of which is chosen to sum the energy of the fundamental frequency harmonics while attenuating smoothly-varying noise components. Temporal continuity constraints are applied to the selected pitch candidates and a voiced speech probability is computed from the likelihood ratio of two classifiers, one for voiced speech and one for unvoiced speech/silence. We compare the performance of our algorithm with that of other widely used algorithms and demonstrate that it performs well in both high and low levels of additive noise.
Gilliam C, Dragotti P-L, Brookes M, 2014, On the Spectrum of the Plenoptic Function, IEEE TRANSACTIONS ON IMAGE PROCESSING, Vol: 23, Pages: 502-516, ISSN: 1057-7149
- Author Web Link
- Cite
- Citations: 36
Hilkhuysen G, Gaubitch N, Brookes M, et al., 2014, Effects of noise suppression on intelligibility. II: An attempt to validate physical metrics, JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, Vol: 135, Pages: 439-450, ISSN: 0001-4966
- Author Web Link
- Cite
- Citations: 5
Pearson J, Visentini-Scarzanella M, Brookes M, et al., 2014, TILTED LAYER-BASED MODELING FOR ENHANCED LIGHT-FIELD PROCESSING AND IMAGE BASED RENDERING, IEEE International Conference on Image Processing (ICIP), Publisher: IEEE, Pages: 1917-1921, ISSN: 1522-4880
Jones Z, Brookes M, Dragotti PL, et al., 2014, WIDE-BASELINE IMAGE CHANGE DETECTION, IEEE International Conference on Image Processing (ICIP), Publisher: IEEE, Pages: 1589-1593, ISSN: 1522-4880
Stanton R, Brookes M, 2014, PATH UNCERTAINTY ROBUST BEAMFORMING, 22nd European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1925-1929, ISSN: 2076-1465
Wang Y, Brookes M, 2014, SPEECH ENHANCEMENT USING A MODULATION DOMAIN KALMAN FILTER POST-PROCESSOR WITH A GAUSSIAN MIXTURE NOISE MODEL, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, ISSN: 1520-6149
Gonzalez S, Brookes M, 2014, MASK-BASED ENHANCEMENT FOR VERY LOW QUALITY SPEECH, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, ISSN: 1520-6149
- Author Web Link
- Cite
- Citations: 3
Moore AH, Brookes M, Naylor PA, 2013, Roomprints for forensic audio applications, Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE
A roomprint is a quantifiable description of an acoustic environment which can be measured under controlled conditions and estimated from a monophonic recording made in that space. We here identify the properties required of a roomprint in forensic audio applications and review the observable characteristics of a room that, when extracted from recordings, could form the basis of a roomprint. Frequency-dependent reverberation time is investigated as a promising characteristic and used in a room identification experiment giving correct identification in 96% of trials.
Gaubitch N, Brookes M, Naylor P, 2013, Blind Channel Magnitude Response Estimation in Speech using Spectrum Classification, IEEE Transactions on Audio, Speech, and Language Processing, Vol: 21, Pages: 2162-2171, ISSN: 1558-7916
Moore AH, Brookes M, Naylor PA, 2013, Room geometry estimation from a single channel acoustic impulse response, Proc. European Signal Processing Conference (EUSIPCO)
Pearson J, Brookes M, Dragotti P-L, 2013, Plenoptic layer-based modelling for image based rendering, IEEE Transactions on Image Processing, Vol: 22, Pages: 3405-3419, ISSN: 1057-7149
Eaton D, Brookes DM, Naylor PA, 2013, A Comparison of Non-Intrusive SNR Estimation Algorithms and the Use of Mapping Functions, EUSIPCO, Publisher: EURASIP, Pages: 1-5
We present a comparative evaluation of six methods for non-intrusive Signal-to-Noise Ratio (SNR) estimation for narrowband speech in noise. We demonstrate that the performance of all methods can be improved by applying a non-linear mapping function to their estimates of SNR. We have employed phrases built from the TIMIT speech corpus and noises from a broad range of sources including ITU-T P.501, NOISEX-92, and Soundjay. We compare the accuracy of the methods in estimating the SNR of both stationary and non-stationary noise and we conclude that with the mapping function, the best current methods can estimate the SNR to within approximately 3.5 dB for SNRs from -5 dB to 35 dB.
Gilliam C, Brookes M, Dragotti PL, 2013, Image-Based Rendering and the Sampling of the Plenoptic Function, Emerging Technologies for 3D Video: Creation, Coding, Transmission and Rendering, Pages: 231-248, ISBN: 9781118355114
Image-based rendering (IBR) is a technique for producing arbitrary views of a scene using multiple images instead of exact object models. The central concept is that each image comprises a collection of light rays and a new view is interpolated from these light rays. If we modelled the light rays using a seven-dimensional function, known as the plenoptic function, then IBR can be viewed in terms of sampling and reconstruction. Therefore the important goal of minimizing the number of images required in IBR, whilst maintaining rendering quality, can be examined through sampling analysis of the plenoptic function. In this context, the chapter examines the state of the art in plenoptic sampling theory. It focuses on both uniform and adaptive sampling of the plenoptic function. In particular, it presents theoretical results for uniform sampling based on spectral analysis of the plenoptic function and algorithms for adaptive plenoptic sampling. © 2013 by John Wiley & Sons, Ltd.
Sharma D, Naylor PA, Brookes M, 2013, NON-INTRUSIVE SPEECH INTELLIGIBILITY ASSESSMENT, 21st European Signal Processing Conference (EUSIPCO), Publisher: IEEE
- Author Web Link
- Cite
- Citations: 4
Wang Y, Brookes M, 2013, A SUBSPACE METHOD FOR SPEECH ENHANCEMENT IN THE MODULATION DOMAIN, 21st European Signal Processing Conference (EUSIPCO), Publisher: IEEE
Gonzalez S, Brookes M, 2013, SPEECH ACTIVE LEVEL ESTIMATION IN NOISY CONDITIONS, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 6684-6688, ISSN: 1520-6149
Wang Y, Brookes M, 2013, SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 7457-7461, ISSN: 1520-6149
- Author Web Link
- Cite
- Citations: 6
Gaubitch ND, Löllmann HW, Jeub M, et al., 2012, Performance comparison of algorithms for blind reverberation time estimation from speech
The reverberation time, T60, is one of the key parameters used to quantify room acoustics. It can provide information about the quality and intelligibility of speech recorded in a reverberant environment, and it can be used to increase robustness to reverberation of speech processing algorithms. T60 can be determined directly from a measurement of the acoustic impulse response, but in situations where this is unavailable it must be estimated blindly from reverberant speech. In this contribution, we provide a study of three state-of-the-art methods for blind T60 estimation. Experimental results with a large number of talkers, simulated and measured acoustic impulse responses, and various levels of additive white Gaussian noise are presented. The relative merits of the three methods in terms of computational time, estimation accuracy, noise sensitivity and inter-talker variance are discussed. In general, all three methods are able to estimate the reverberation time to within 0.2 s for T60 ≤ 0.8 s and SNR ≥ 30 dB, while increasing the noise level causes overestimation. The relative computational speed of the three methods is also assessed.
Hilkhuysen G, Gaubitch N, Brookes M, et al., 2012, Effects of noise suppression on intelligibility: dependency on signal-to-noise ratios, Journal of the Acoustical Society of America, Vol: 131, Pages: 531-539
Sharma D, Hilkhuysen G, Naylor PA, et al., 2012, Descriptive Vocabulary Development for Degraded Speech, 13th Annual Conference of the International-Speech-Communication-Association, Publisher: ISCA-INT SPEECH COMMUNICATION ASSOC, Pages: 1494-1497
Gonzalez S, Brookes M, 2012, SIBILANT SPEECH DETECTION IN NOISE, 13th Annual Conference of the International-Speech-Communication-Association, Publisher: ISCA-INT SPEECH COMMUNICATION ASSOC, Pages: 1486-1489
Gilliam C, Pearson J, Brookes M, et al., 2012, IMAGE BASED RENDERING WITH DEPTH CAMERAS: HOW MANY ARE NEEDED?, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 5437-5440, ISSN: 1520-6149
Sharma D, Naylor PA, Gaubitch ND, et al., 2012, NON INTRUSIVE CODEC IDENTIFICATION ALGORITHM, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 4477-4480, ISSN: 1520-6149
- Author Web Link
- Cite
- Citations: 7
Chorti A, Brookes M, 2011, On the Effect of Voigt Profile Oscillators on OFDM Systems, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, Vol: 58, Pages: 768-772, ISSN: 1549-7747
- Author Web Link
- Cite
- Citations: 1
Gaubitch ND, Brookes M, Naylor PA, et al., 2011, Bayesian Adaptive method for estimating Speech Intelligibility in noise, Pages: 169-174
We present the Bayesian Adaptive Speech Intelligibility Estimation (BASIE) method - a tool for rapid estimation of a given speech reception threshold (SRT) and the slope at that threshold of multiple psychometric functions for speech intelligibility in noise. The core of this tool is an adaptive Bayesian procedure, which adjusts the signal-to-noise ratio at each subsequent stimulus such that the expected variance of the threshold and slope estimates are minimised. Simulation results show that the algorithm is able to achieve SRT estimates accurate to within ±1 dB in under 30 iterations. Furthermore, we discuss strategies for using BASIE to evaluate the effects of speech processing algorithms on intelligibility and we give two illustrative examples for different noise reduction methods with supporting listening experiments.
Sharma D, Hilkhuysen G, Gaubitch ND, et al., 2011, C-Qual - A validation of PESQ using degradations encountered in forensic and law enforcement audio, Pages: 177-181
Assessment of speech quality of law-enforcement audio recordings is important as degradations introduced by non-ideal recording conditions can reduce the intelligence value of such recordings. Furthermore a model that predicts speech quality could be beneficial for assessing the performance of audio collection and enhancement systems. The Perceptual Evaluation of Speech Quality (PESQ) algorithm (ITU-T P.862) has been validated for degradations common in telecommunications. In this paper we apply PESQ to degradations typically encountered in law-enforcement. Also we present a subjectively labeled database (C-Qual) containing distortions encountered in law enforcement scenarios. Comparing the prediction by PESQ and the observed opinions provided by the listeners shows that PESQ is less suitable for estimating the speech quality in this context.
Chorti A, Brookes M, 2011, Performance analysis of OFDM and DAB receivers in the presence of spurious tones, TELECOMMUNICATION SYSTEMS, Vol: 46, Pages: 181-190, ISSN: 1018-4864
- Author Web Link
- Cite
- Citations: 4
Bai J, Brookes M, 2011, ADAPTIVE HIDDEN MARKOV MODELS FOR NOISE MODELLING, 19th European Signal Processing Conference (EUSIPCO), Publisher: EUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP, Pages: 2324-2328, ISSN: 2076-1465
Gonzalez S, Brookes M, 2011, A PITCH ESTIMATION FILTER ROBUST TO HIGH LEVELS OF NOISE (PEFAC), 19th European Signal Processing Conference (EUSIPCO), Publisher: EUROPEAN ASSOC SIGNAL SPEECH & IMAGE PROCESSING-EURASIP, Pages: 451-455, ISSN: 2076-1465
- Author Web Link
- Cite
- Citations: 46
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.