Publications
Results
- Showing results for:
- Reset all filters
Search results
-
Conference paperD'Olne E, Neo VW, Naylor PA, 2022,
Frame-based space-time covariance matrix estimation for polynomial eigenvalue decomposition-based speech enhancement
, International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE, Pages: 1-5Recent work in speech enhancement has proposed a polynomial eigenvalue decomposition (PEVD) method, yielding significant intelligibility and noise-reduction improvements without introducing distortions in the enhanced signal [1]. The method relies on the estimation of a space-time covariance matrix, performed in batch mode such that a sufficiently long portion of the noisy signal is used to derive an accurate estimate. However, in applications where the scene is nonstationary, this approach is unable to adapt to changes in the acoustic scenario. This paper thus proposes a frame-based procedure for the estimation of space-time covariance matrices and investigates its impact on subsequent PEVD speech enhancement. The method is found to yield spatial filters and speech enhancement improvements comparable to the batch method in [1], showing potential for real-time processing.
-
Conference paperTokala V, Brookes M, Naylor P, 2022,
Binaural speech enhancement using STOI-optimal masks
, International Workshop on Acoustic Signal Enhancement (IWAENC) 2022, Publisher: IEEE, Pages: 1-5STOI-optimal masking has been previously proposed and developed for single-channel speech enhancement. In this paper, we consider the extension to the task of binaural speech enhancement in which the spatial information is known to be important to speech understanding and therefore should bepreserved by the enhancement processing. Masks are estimated for each of the binaural channels individually and a ‘better-ear listening’ mask is computed by choosing the maximum of the two masks. The estimated mask is used to supply probability information about the speech presence in eachtime-frequency bin to an Optimally-modified Log Spectral Amplitude (OM-LSA) enhancer. We show that using the pro-posed method for binaural signals with a directional noise not only improves the SNR of the noisy signal but also preserves the binaural cues and intelligibility.
-
Conference paperMcKnight S, Hogg AOT, Neo VW, et al., 2022,
Studying Human-Based Speaker Diarization and Comparing to State-of-the-Art Systems
, APSIPA 2022 -
Conference paperNeo VW, Weiss S, McKnight S, et al., 2022,
Polynomial eigenvalue decomposition-based target speaker voice activity detection in the presence of competing talkers
, 17th International Workshop on Acoustic Signal EnhancementVoice activity detection (VAD) algorithms are essential for many speech processing applications, such as speaker diarization, automatic speech recognition, speech enhancement, and speech coding. With a good VAD algorithm, non-speech segments can be excluded to improve the performance and computation of these applications. In this paper, we propose a polynomial eigenvalue decomposition-based target-speaker VAD algorithm to detect unseen target speakers in the presence of competing talkers. The proposed approach uses frame-based processing to compute the syndrome energy, used for testing the presence or absence of a target speaker. The proposed approach is consistently among the best in F1 and balanced accuracy scores over the investigated range of signal to interference ratio (SIR) from -10 dB to 20 dB.
-
Conference paperNeo VW, D'Olne E, Moore AH, et al., 2022,
Fixed beamformer design using polynomial eigenvalue decomposition
, International Workshop on Acoustic Signal Enhancement (IWAENC)Array processing is widely used in many speech applications involving multiple microphones. These applications include automaticspeech recognition, robot audition, telecommunications, and hearing aids. A spatio-temporal filter for the array allows signals fromdifferent microphones to be combined desirably to improve the application performance. This paper will analyze and visually interpretthe eigenvector beamformers designed by the polynomial eigenvaluedecomposition (PEVD) algorithm, which are suited for arbitrary arrays. The proposed fixed PEVD beamformers are lightweight, withan average filter length of 114 and perform comparably to classicaldata-dependent minimum variance distortionless response (MVDR)and linearly constrained minimum variance (LCMV) beamformersfor the separation of sources closely spaced by 5 degrees.
-
Conference paperNeo VW, Weiss S, Naylor PA, 2022,
A polynomial subspace projection approach for the detection of weak voice activity
, Sensor Signal Processing for Defence conference (SSPD), Publisher: IEEEA voice activity detection (VAD) algorithm identifies whether or not time frames contain speech. It is essential for many military and commercial speech processing applications, including speech enhancement, speech coding, speaker identification, and automatic speech recognition. In this work, we adopt earlier work on detecting weak transient signals and propose a polynomial subspace projection pre-processor to improve an existing VAD algorithm. The proposed multi-channel pre-processor projects the microphone signals onto a lower dimensional subspace which attempts to remove the interferer components and thus eases the detection of the speech target. Compared to applying the same VAD to the microphone signal, the proposed approach almost always improves the F1 and balanced accuracy scores even in adverse environments, e.g. -30 dB SIR, which may be typical of operations involving noisy machinery and signal jamming scenarios.
-
Conference paperD'Olne E, Neo VW, Naylor PA, 2022,
Speech enhancement in distributed microphone arrays using polynomial eigenvalue decomposition
, Europen Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2219-5491As the number of connected devices equipped withmultiple microphones increases, scientific interest in distributedmicrophone array processing grows. Current beamforming meth-ods heavily rely on estimating quantities related to array geom-etry, which is extremely challenging in real, non-stationary envi-ronments. Recent work on polynomial eigenvalue decomposition(PEVD) has shown promising results for speech enhancement insingular arrays without requiring the estimation of any array-related parameter [1]. This work extends these results to therealm of distributed microphone arrays, and further presentsa novel framework for speech enhancement in distributed mi-crophone arrays using PEVD. The proposed approach is shownto almost always outperform optimum beamformers located atarrays closest to the desired speaker. Moreover, the proposedapproach exhibits very strong robustness to steering vectorerrors.
-
Conference paperMcKnight S, Hogg A, Neo V, et al., 2022,
A study of salient modulation domain features for speaker identification
, Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Publisher: IEEE, Pages: 705-712This paper studies the ranges of acoustic andmodulation frequencies of speech most relevant for identifyingspeakers and compares the speaker-specific information presentin the temporal envelope against that present in the temporalfine structure. This study uses correlation and feature importancemeasures, random forest and convolutional neural network mod-els, and reconstructed speech signals with specific acoustic and/ormodulation frequencies removed to identify the salient points. Itis shown that the range of modulation frequencies associated withthe fundamental frequency is more important than the 1-16 Hzrange most commonly used in automatic speech recognition, andthat the 0 Hz modulation frequency band contains significantspeaker information. It is also shown that the temporal envelopeis more discriminative among speakers than the temporal finestructure, but that the temporal fine structure still contains usefuladditional information for speaker identification. This researchaims to provide a timely addition to the literature by identifyingspecific aspects of speech relevant for speaker identification thatcould be used to enhance the discriminant capabilities of machinelearning models.
-
Conference paperD'Olne E, Moore A, Naylor P, 2021,
Model-based beamforming for wearable microphone arrays
, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1105-1109Beamforming techniques for hearing aid applications are often evaluated using behind-the-ear (BTE) devices. However, the growing number of wearable devices with microphones has made it possible to consider new geometries for microphone array beamforming. In this paper, we examine the effect of array location and geometry on the performance of binaural minimum power distortionless response (BMPDR) beamformers. In addition to the classical adaptive BMPDR, we evaluate the benefit of a recently-proposed method that estimates the sample covariance matrix using a compact model. Simulation results show that using a chest-mounted array reduces noise by an additional 1.3~dB compared to BTE hearing aids. The compact model method is found to yield higher predicted intelligibility than adaptive BMPDR beamforming, regardless of the array geometry.
-
Conference paperNeo VW, Evers C, Naylor PA, 2021,
Polynomial Matrix Eigenvalue Decomposition-Based Source Separation Using Informed Spherical Microphone Arrays
, 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE -
Conference paperHogg AOT, Neo VW, Weiss S, et al., 2021,
A Polynomial Eigenvalue Decomposition Music Approach for Broadband Sound Source Localization
, 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE -
Conference paperJones DT, Sharma D, Kruchinin SY, et al., 2021,
Spatial Coding for Microphone Arrays using IPNLMS-Based RTF Estimation
, 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) -
Conference paperHogg AOT, Evers C, Naylor PA, 2021,
Multichannel Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking Of Acoustic And Spatial Features
, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE -
Conference paperNeo VW, Evers C, Naylor PA, 2021,
Polynomial matrix eigenvalue decomposition of spherical harmonics for speech enhancement
, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 786-790Speech enhancement algorithms using polynomial matrix eigen value decomposition (PEVD) have been shown to be effective for noisy and reverberant speech. However, these algorithms do not scale well in complexity with the number of channels used in the processing. For a spherical microphone array sampling an order-limited sound field, the spherical harmonics provide a compact representation of the microphone signals in the form of eigen beams. We propose a PEVD algorithm that uses only the lower dimension eigen beams for speech enhancement at a significantly lower computation cost. The proposed algorithm is shown to significantly reduce complexity while maintaining full performance. Informal listening examples have also indicated that the processing does not introduce any noticeable artefacts.
-
Conference paperMoore A, Vos R, Naylor P, et al., 2021,
Processing pipelines for efficient, physically-accurate simulation of microphone array signals in dynamic sound scenes
, ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, ISSN: 0736-7791Multichannel acoustic signal processing is predicated on the fact that the inter channel relationships between the received signals can be exploited to infer information about the acoustic scene. Recently there has been increasing interest in algorithms which are applicable in dynamic scenes, where the source(s) and/or microphone array may be moving. Simulating such scenes has particular challenges which are exacerbated when real-time, listener-in-the-loop evaluation of algorithms is required. This paper considers candidate pipelines for simulating the array response to a set of point/image sources in terms of their accuracy, scalability and continuity. Anew approach, in which the filter kernels are obtained using principal component analysis from time-aligned impulse responses, is proposed. When the number of filter kernels is≤40the new approach achieves more accurate simulation than competing methods.
-
Journal articleHogg A, Evers C, Moore A, et al., 2021,
Overlapping speaker segmentation using multiple hypothesis tracking of fundamental frequency
, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 29, Pages: 1479-1490, ISSN: 2329-9290This paper demonstrates how the harmonic structure of voiced speech can be exploited to segment multiple overlapping speakers in a speaker diarization task. We explore how a change in the speaker can be inferred from a change in pitch. We show that voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker’s utterance in the presence of an additional active speaker. This system is bench-marked against a segmentation system from the literature that employs a bidirectional long short term memory network (BLSTM) approach and requires training. Experimental results highlight that the proposed approach outperforms the BLSTM baseline approach by 12.9% in terms of HIT rate for speaker segmentation. We also show that the estimated pitch tracks of our system can be used as features to the BLSTM to achieve further improvements of 1.21% in terms of coverage and 2.45% in terms of purity
-
Journal articleYiallourides C, Naylor PA, 2021,
Time-frequency analysis and parameterisation of knee sounds fornon-invasive setection of osteoarthritis
, IEEE Transactions on Biomedical Engineering, Vol: 68, Pages: 1250-1261, ISSN: 0018-9294Objective: In this work the potential of non-invasive detection of kneeosteoarthritis is investigated using the sounds generated by the knee jointduring walking. Methods: The information contained in the time-frequency domainof these signals and its compressed representations is exploited and theirdiscriminant properties are studied. Their efficacy for the task of normal vsabnormal signal classification is evaluated using a comprehensive experimentalframework. Based on this, the impact of the feature extraction parameters onthe classification performance is investigated using Classification andRegression Trees (CART), Linear Discriminant Analysis (LDA) and Support VectorMachine (SVM) classifiers. Results: It is shown that classification issuccessful with an area under the Receiver Operating Characteristic (ROC) curveof 0.92. Conclusion: The analysis indicates improvements in classificationperformance when using non-uniform frequency scaling and identifies specificfrequency bands that contain discriminative features. Significance: Contrary toother studies that focus on sit-to-stand movements and knee flexion/extension,this study used knee sounds obtained during walking. The analysis of suchsignals leads to non-invasive detection of knee osteoarthritis with highaccuracy and could potentially extend the range of available tools for theassessment of the disease as a more practical and cost effective method withoutrequiring clinical setups.
-
Journal articleHafezi S, Moore A, Naylor P, 2021,
Narrowband multi-source Direction-of-Arrival estimation in the spherical harmonic domain
, Journal of the Acoustical Society of America, Vol: 149, ISSN: 0001-4966A conventional approach to wideband multi-source (MS) direction-of-arrival (DOA) estimation is to perform single source (SS) DOA estimation in time-frequency (TF) bins for which a SS assumption is valid. Such methods use the W-disjoint orthogonality (WDO) assumption due to the speech sparseness. As the number of sources increases, the chance of violating the WDO assumption increases. As shown in the challenging scenarios with multiple simultaneously active sources over a short period of time masking each other, it is possible for a strongly masked source (due to inconsistency of activity or quietness) to be rarely dominant in a TF bin. SS-based DOA estimators fail in the detection or accurate localization of masked sources in such scenarios. Two analytical approaches are proposed for narrowband DOA estimation based on the MS assumption in a bin in the spherical harmonic domain. In the first approach, eigenvalue decomposition is used to decompose a MS scenario into multiple SS scenarios, and a SS-based analytical DOA estimation is performed on each. The second approach analytically estimates two DOAs per bin assuming the presence of two active sources per bin. The evaluation validates the improvement to double accuracy and robustness to sensor noise compared to the baseline methods.
-
Conference paperSharma D, Berger L, Quillen C, et al., 2021,
Non-intrusive estimation of speech signal parameters using a frame-based machine learning approach
, 2020 28th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 446-450We present a novel, non-intrusive method that jointly estimates acoustic signal properties associated with the perceptual speech quality, level of reverberation and noise in a speech signal. We explore various machine learning frameworks, consisting of popular feature extraction front-ends and two types of regression models and show the trade-off in performance that must be considered with each combination. We show that a short-time framework consisting of an 80-dimension log-Mel filter bank feature front-end employing spectral augmentation, followed by a 3 layer LSTM recurrent neural network model achieves a mean absolute error of 3.3 dB for C50, 2.3 dB for segmental SNR and 0.3 for PESQ estimation on the Libri Augmented (LA) database. The internal VAD for this system achieves an F1 score of 0.93 on this data. The proposed system also achieves a 2.4 dB mean absolute error for C50 estimation on the ACE test set. Furthermore, we show how each type of acoustic parameter correlates with ASR performance in terms of ground truth labels and additionally show that the estimated C50, SNR and PESQ from our proposed method have a high correlation (greater than 0.92) with WER on the LA test set.
-
Conference paperFelsheim RC, Brendel A, Naylor PA, et al., 2021,
Head Orientation Estimation from Multiple Microphone Arrays
, 28th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 491-495, ISSN: 2076-1465- Author Web Link
- Cite
- Citations: 1
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.