Publications
Results
- Showing results for:
- Reset all filters
Search results
-
Conference paperD'Olne E, Neo VW, Naylor PA, 2022,
Frame-based space-time covariance matrix estimation for polynomial eigenvalue decomposition-based speech enhancement
, International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE, Pages: 1-5Recent work in speech enhancement has proposed a polynomial eigenvalue decomposition (PEVD) method, yielding significant intelligibility and noise-reduction improvements without introducing distortions in the enhanced signal [1]. The method relies on the estimation of a space-time covariance matrix, performed in batch mode such that a sufficiently long portion of the noisy signal is used to derive an accurate estimate. However, in applications where the scene is nonstationary, this approach is unable to adapt to changes in the acoustic scenario. This paper thus proposes a frame-based procedure for the estimation of space-time covariance matrices and investigates its impact on subsequent PEVD speech enhancement. The method is found to yield spatial filters and speech enhancement improvements comparable to the batch method in [1], showing potential for real-time processing.
-
Conference paperNeo VW, D'Olne E, Moore AH, et al., 2022,
Fixed beamformer design using polynomial eigenvalue decomposition
, International Workshop on Acoustic Signal Enhancement (IWAENC)Array processing is widely used in many speech applications involving multiple microphones. These applications include automaticspeech recognition, robot audition, telecommunications, and hearing aids. A spatio-temporal filter for the array allows signals fromdifferent microphones to be combined desirably to improve the application performance. This paper will analyze and visually interpretthe eigenvector beamformers designed by the polynomial eigenvaluedecomposition (PEVD) algorithm, which are suited for arbitrary arrays. The proposed fixed PEVD beamformers are lightweight, withan average filter length of 114 and perform comparably to classicaldata-dependent minimum variance distortionless response (MVDR)and linearly constrained minimum variance (LCMV) beamformersfor the separation of sources closely spaced by 5 degrees.
-
Conference paperMcKnight S, Hogg AOT, Neo VW, et al., 2022,
Studying Human-Based Speaker Diarization and Comparing to State-of-the-Art Systems
, APSIPA 2022 -
Conference paperNeo VW, Weiss S, McKnight S, et al., 2022,
Polynomial eigenvalue decomposition-based target speaker voice activity detection in the presence of competing talkers
, 17th International Workshop on Acoustic Signal EnhancementVoice activity detection (VAD) algorithms are essential for many speech processing applications, such as speaker diarization, automatic speech recognition, speech enhancement, and speech coding. With a good VAD algorithm, non-speech segments can be excluded to improve the performance and computation of these applications. In this paper, we propose a polynomial eigenvalue decomposition-based target-speaker VAD algorithm to detect unseen target speakers in the presence of competing talkers. The proposed approach uses frame-based processing to compute the syndrome energy, used for testing the presence or absence of a target speaker. The proposed approach is consistently among the best in F1 and balanced accuracy scores over the investigated range of signal to interference ratio (SIR) from -10 dB to 20 dB.
-
Conference paperNeo VW, Weiss S, Naylor PA, 2022,
A polynomial subspace projection approach for the detection of weak voice activity
, Sensor Signal Processing for Defence conference (SSPD), Publisher: IEEEA voice activity detection (VAD) algorithm identifies whether or not time frames contain speech. It is essential for many military and commercial speech processing applications, including speech enhancement, speech coding, speaker identification, and automatic speech recognition. In this work, we adopt earlier work on detecting weak transient signals and propose a polynomial subspace projection pre-processor to improve an existing VAD algorithm. The proposed multi-channel pre-processor projects the microphone signals onto a lower dimensional subspace which attempts to remove the interferer components and thus eases the detection of the speech target. Compared to applying the same VAD to the microphone signal, the proposed approach almost always improves the F1 and balanced accuracy scores even in adverse environments, e.g. -30 dB SIR, which may be typical of operations involving noisy machinery and signal jamming scenarios.
-
Conference paperD'Olne E, Neo VW, Naylor PA, 2022,
Speech enhancement in distributed microphone arrays using polynomial eigenvalue decomposition
, Europen Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2219-5491As the number of connected devices equipped withmultiple microphones increases, scientific interest in distributedmicrophone array processing grows. Current beamforming meth-ods heavily rely on estimating quantities related to array geom-etry, which is extremely challenging in real, non-stationary envi-ronments. Recent work on polynomial eigenvalue decomposition(PEVD) has shown promising results for speech enhancement insingular arrays without requiring the estimation of any array-related parameter [1]. This work extends these results to therealm of distributed microphone arrays, and further presentsa novel framework for speech enhancement in distributed mi-crophone arrays using PEVD. The proposed approach is shownto almost always outperform optimum beamformers located atarrays closest to the desired speaker. Moreover, the proposedapproach exhibits very strong robustness to steering vectorerrors.
-
Conference paperMcKnight S, Hogg A, Neo V, et al., 2022,
A study of salient modulation domain features for speaker identification
, Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Publisher: IEEE, Pages: 705-712This paper studies the ranges of acoustic andmodulation frequencies of speech most relevant for identifyingspeakers and compares the speaker-specific information presentin the temporal envelope against that present in the temporalfine structure. This study uses correlation and feature importancemeasures, random forest and convolutional neural network mod-els, and reconstructed speech signals with specific acoustic and/ormodulation frequencies removed to identify the salient points. Itis shown that the range of modulation frequencies associated withthe fundamental frequency is more important than the 1-16 Hzrange most commonly used in automatic speech recognition, andthat the 0 Hz modulation frequency band contains significantspeaker information. It is also shown that the temporal envelopeis more discriminative among speakers than the temporal finestructure, but that the temporal fine structure still contains usefuladditional information for speaker identification. This researchaims to provide a timely addition to the literature by identifyingspecific aspects of speech relevant for speaker identification thatcould be used to enhance the discriminant capabilities of machinelearning models.
-
Conference paperNeo VW, Evers C, Naylor PA, 2021,
Polynomial Matrix Eigenvalue Decomposition-Based Source Separation Using Informed Spherical Microphone Arrays
, 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE -
Conference paperHogg AOT, Neo VW, Weiss S, et al., 2021,
A Polynomial Eigenvalue Decomposition Music Approach for Broadband Sound Source Localization
, 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE -
Conference paperHogg AOT, Evers C, Naylor PA, 2021,
Multichannel Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking Of Acoustic And Spatial Features
, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE -
Conference paperNeo VW, Evers C, Naylor PA, 2021,
Polynomial matrix eigenvalue decomposition of spherical harmonics for speech enhancement
, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 786-790Speech enhancement algorithms using polynomial matrix eigen value decomposition (PEVD) have been shown to be effective for noisy and reverberant speech. However, these algorithms do not scale well in complexity with the number of channels used in the processing. For a spherical microphone array sampling an order-limited sound field, the spherical harmonics provide a compact representation of the microphone signals in the form of eigen beams. We propose a PEVD algorithm that uses only the lower dimension eigen beams for speech enhancement at a significantly lower computation cost. The proposed algorithm is shown to significantly reduce complexity while maintaining full performance. Informal listening examples have also indicated that the processing does not introduce any noticeable artefacts.
-
Journal articleHogg A, Evers C, Moore A, et al., 2021,
Overlapping speaker segmentation using multiple hypothesis tracking of fundamental frequency
, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 29, Pages: 1479-1490, ISSN: 2329-9290This paper demonstrates how the harmonic structure of voiced speech can be exploited to segment multiple overlapping speakers in a speaker diarization task. We explore how a change in the speaker can be inferred from a change in pitch. We show that voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker’s utterance in the presence of an additional active speaker. This system is bench-marked against a segmentation system from the literature that employs a bidirectional long short term memory network (BLSTM) approach and requires training. Experimental results highlight that the proposed approach outperforms the BLSTM baseline approach by 12.9% in terms of HIT rate for speaker segmentation. We also show that the estimated pitch tracks of our system can be used as features to the BLSTM to achieve further improvements of 1.21% in terms of coverage and 2.45% in terms of purity
-
Journal articleNeo VW, Evers C, Naylor PA, 2021,
Enhancement of Noisy Reverberant Speech Using Polynomial Matrix Eigenvalue Decomposition
, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol: 29, Pages: 3255-3266, ISSN: 2329-9290 -
Conference paperNeo VW, Evers C, Naylor PA, 2021,
Speech dereverberation performance of a polynomial-EVD subspace approach
, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465The degradation of speech arising from additive background noise and reverberation affects the performance of important speech applications such as telecommunications, hearing aids, voice-controlled systems and robot audition. In this work, we focus on dereverberation. It is shown that the parameterized polynomial matrix eigenvalue decomposition (PEVD)-based speech enhancement algorithm exploits the lack of correlation between speech and the late reflections to enhance the speech component associated with the direct path and early reflections. The algorithm's performance is evaluated using simulations involving measured acoustic impulse responses and noise from the ACE corpus. The simulations and informal listening examples have indicated that the PEVD-based algorithm performs dereverberation over a range of SNRs without introducing any noticeable processing artefacts.
-
Conference paperMcKnight SW, Hogg A, Naylor P, 2020,
Analysis of phonetic dependence of segmentation errors in speaker diarization
, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465Evaluation of speaker segmentation and diarization normally makes use of forgiveness collars around ground truth speaker segment boundaries such that estimated speaker segment boundaries with such collars are considered completely correct. This paper shows that the popular recent approach of removing forgiveness collars from speaker diarization evaluation tools can unfairly penalize speaker diarization systems that correctly estimate speaker segment boundaries. The uncertainty in identifying the start and/or end of a particular phoneme means that the ground truth segmentation is not perfectly accurate, and even trained human listeners are unable to identify phoneme boundaries with full consistency. This research analyses the phoneme dependence of this uncertainty, and shows that it depends on (i) whether the phoneme being detected is at the start or end of an utterance and (ii) what the phoneme is, so that the use of a uniform forgiveness collar is inadequate. This analysis is expected to point the way towards more indicative and repeatable assessment of the performance of speaker diarization systems.
-
Conference paperNeo VW, Evers C, Naylor PA, 2020,
PEVD-based speech enhancement in reverberant environments
, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 186-190The enhancement of noisy speech is important for applications involving human-to-human interactions, such as telecommunications and hearing aids, as well as human-to-machine interactions, such as voice-controlled systems and robot audition. In this work, we focus on reverberant environments. It is shown that, by exploiting the lack of correlation between speech and the late reflections, further noise reduction can be achieved. This is verified using simulations involving actual acoustic impulse responses and noise from the ACE corpus. The simulations show that even without using a noise estimator, our proposed method simultaneously achieves noise reduction, and enhancement of speech quality and intelligibility, in reverberant environments over a wide range of SNRs. Furthermore, informal listening examples highlight that our approach does not introduce any significant processing artefacts such as musical noise.
-
Journal articleJoudeh H, Clerckx B, 2019,
On the optimality of treating inter-cell interference as noise in uplink cellular networks
, IEEE Transactions on Information Theory, Vol: 65, Pages: 7208-7232, ISSN: 0018-9448In this paper, we explore the information-theoretic optimality of treating interference as noise (TIN) in cellular networks. We focus on uplink scenarios modeled by the Gaussian interfering multiple access channel (IMAC), comprising K mutually interfering multiple access channels (MACs), each formed by an arbitrary number of transmitters communicating independent messages to one receiver. We define TIN for this setting as a scheme in which each MAC (or cell) performs a power-controlled version of its capacity-achieving strategy, with Gaussian codebooks and successive decoding, while treating interference from all other MACs (i.e. inter-cell interference) as noise. We characterize the generalized degrees-of-freedom (GDoF) region achieved through the proposed TIN scheme, and then identify conditions under which this achievable region is convex without the need for time-sharing. We then tighten these convexity conditions and identify a regime in which the proposed TIN scheme achieves the entire GDoF region of the IMAC and is within a constant gap of the entire capacity region.
-
Journal articleKotzagiannidis MS, Dragotti PL, 2019,
Sampling and reconstruction of sparse signals on circulant graphs – an introduction to graph-FRI
, Applied and Computational Harmonic Analysis, Vol: 47, Pages: 539-565, ISSN: 1096-603XWith the objective of employing graphs toward a more generalized theory of signal processing, we present a novel sampling framework for (wavelet-)sparse signals defined on circulant graphs which extends basic properties of Finite Rate of Innovation (FRI) theory to the graph domain, and can be applied to arbitrary graphs via suitable approximation schemes. At its core, the introduced Graph-FRI-framework states that any K-sparse signal on the vertices of a circulant graph can be perfectly reconstructed from its dimensionality-reduced representation in the graph spectral domain, the Graph Fourier Transform (GFT), of minimum size 2K. By leveraging the recently developed theory of e-splines and e-spline wavelets on graphs, one can decompose this graph spectral transformation into the multiresolution low-pass filtering operation with a graph e-spline filter, with subsequent transformation to the spectral graph domain; this allows to infer a distinct sampling pattern, and, ultimately, the structure of an associated coarsened graph, which preserves essential properties of the original, including circularity and, where applicable, the graph generating set.
-
Conference paperHogg AOT, Evers C, Naylor PA, 2019,
Multiple Hypothesis Tracking for Overlapping Speaker Segmentation
, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE -
Conference paperNeo VW, Evers C, Naylor PA, 2019,
Speech Enhancement Using Polynomial Eigenvalue Decomposition
, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE -
Journal articleKotzagiannidis MS, Dragotti PL, 2019,
Splines and Wavelets on Circulant Graphs
, Applied and Computational Harmonic Analysis, Vol: 47, Pages: 481-515, ISSN: 1096-603XWe present novel families of wavelets and associated filterbanks for the analysis and representation of functions defined on circulant graphs. In this work, we leverage the inherent vanishing moment property of the circulant graph Laplacian operator, and by extension, the e-graph Laplacian, which is established as a parameterization of the former with respect to the degree per node, for the design of vertex-localized and critically-sampled higher-order graph (e-)spline wavelet filterbanks, which can reproduce and annihilate classes of (exponential) polynomial signals on circulant graphs. In addition, we discuss similarities and analogies of the detected properties and resulting constructions with splines and spline wavelets in the Euclidean domain. Ultimately, we consider generalizations to arbitrary graphs in the form of graph approximations, with focus on graph product decompositions. In particular, we proceed to show how the use of graph products facilitates a multi-dimensional extension of the proposed constructions and properties.
-
Conference paperSharma D, Hogg AOT, Wang Y, et al., 2019,
Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks
, 2019 27th European Signal Processing Conference (EUSIPCO), Publisher: IEEE -
Conference paperHogg AOT, Evers C, Naylor PA, 2019,
Speaker Change Detection Using Fundamental Frequency with Application to Multi-talker Segmentation
, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE -
Conference paperNeo VW, Naylor PA, 2019,
Second Order Sequential Best Rotation Algorithm with Householder Reduction for Polynomial Matrix Eigenvalue Decomposition
, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE -
Journal articleCampello A, Dadush D, Ling C, 2019,
AWGN-goodness is enough: capacity-achieving lattice codes based on dithered probabilistic shaping
, IEEE Transactions on Information Theory, Vol: 65, Pages: 1961-1971, ISSN: 0018-9448In this paper we show that any sequence of infinite lattice constellations which is good for the unconstrained Gaussian channel can be shaped into a capacity-achieving sequence of codes for the power-constrained Gaussian channel under lattice decoding and non-uniform signalling. Unlike previous results in the literature, our scheme holds with no extra condition on the lattices (e.g. quantization-goodness or vanishing flatness factor), thus establishing a direct implication between AWGNgoodness, in the sense of Poltyrev, and capacity-achieving codes. Our analysis uses properties of the discrete Gaussian distribution in order to obtain precise bounds on the probability of error and achievable rates. In particular, we obtain a simple characterization of the finite-blocklength behavior of the scheme, showing that it approaches the optimal dispersion coefficient for high signalto- noise ratio. We further show that for low signal-to-noise ratio the discrete Gaussian over centered lattice constellations cannot achieve capacity, and thus a shift (or “dither”) is essentially necessary.
-
Journal articleMoore A, Xue W, Naylor P, et al., 2019,
Noise covariance matrix estimation for rotating microphone arrays
, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 27, Pages: 519-530, ISSN: 2329-9290The noise covariance matrix computed between the signals from a microphone array is used in the design of spatial filters and beamformers with applications in noise suppression and dereverberation. This paper specifically addresses the problem of estimating the covariance matrix associated with a noise field when the array is rotating during desired source activity, as is common in head-mounted arrays. We propose a parametric model that leads to an analytical expression for the microphone signal covariance as a function of the array orientation and array manifold. An algorithm for estimating the model parameters during noise-only segments is proposed and the performance shown to be improved, rather than degraded, by array rotation. The stored model parameters can then be used to update the covariance matrix to account for the effects of any array rotation that occurs when the desired source is active. The proposed method is evaluated in terms of the Frobenius norm of the error in the estimated covariance matrix and of the noise reduction performance of a minimum variance distortionless response beamformer. In simulation experiments the proposed method achieves 18 dB lower error in the estimated noise covariance matrix than a conventional recursive averaging approach and results in noise reduction which is within 0.05 dB of an oracle beamformer using the ground truth noise covariance matrix.
-
Journal articleCampello A, Ling C, Belfiore J-C, 2018,
Universal lattice codes for MIMO channels
, IEEE Transactions on Information Theory, Vol: 64, Pages: 7847-7865, ISSN: 0018-9448We propose a coding scheme that achieves the capacity of the compound MIMO channel with algebraic lattices. Our lattice construction exploits the multiplicative structure of number fields and their group of units to absorb ill-conditioned channel realizations. To shape the constellation, a discrete Gaussian distribution over the lattice points is applied. These techniques, along with algebraic properties of the proposed lattices, are then used to construct a sub-optimal de-coupled coding scheme that achieves a constant gap to compound capacity by decoding in a lattice that does not dependent on the channel realization. The gap is characterized in terms of algebraic invariants of the code and is shown to be significantly smaller than previous schemes in the literature. We also exhibit alternative algebraic constructions that achieve the capacity of ergodic (SISO) fading channels.
-
Conference paperMoore AH, Lightburn L, Xue W, et al., 2018,
Binaural mask-informed speech enhancement for hearing aids with head tracking
, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEE, Pages: 461-465An end-to-end speech enhancement system for hearing aids is pro-posed which seeks to improve the intelligibility of binaural speechin noise during head movement. The system uses a reference beam-former whose look direction is informed by knowledge of the headorientation and the a priori known direction of the desired source.From this a time-frequency mask is estimated using a deep neuralnetwork. The binaural signals are obtained using bilateral beam-formers followed by a classical minimum mean square error speechenhancer, modified to use the estimated mask as a speech presenceprobability prior. In simulated experiments, the improvement in abinaural intelligibility metric (DBSTOI) given by the proposed sys-tem relative to beamforming alone corresponds to an SNR improve-ment of 4 to 6 dB. Results also demonstrate the individual contribu-tions of incorporating the mask and the head orientation-aware beamsteering to the proposed system.
-
Conference paperEvers C, Loellmann H, Mellmann H, et al., 2018,
LOCATA challenge - evaluation tasks and measures
, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEESound source localization and tracking algorithms provide estimatesof the positional information about active sound sources in acous-tic environments. Despite substantial advances and significant in-terest in the research community, a comprehensive benchmarkingcampaign of the various approaches using a common database ofaudio recordings has, to date, not been performed. The aim of theIEEE-AASP Challenge on sound source localization and tracking(LOCATA) is to objectively benchmark state-of-the-art localizationand tracking algorithms using an open-access data corpus of record-ings for scenarios typically encountered in audio and acoustic signalprocessing applications. The challenge tasks range from the local-ization of a single source with a static microphone array to trackingof multiple moving sources with a moving microphone array. Thispaper provides an overview of the challenge tasks, describes the per-formance measures used for evaluation of the LOCATA Challenge,and presents baseline results for the development dataset.
-
Journal articleClerckx B, Kim J, 2018,
On the beneficial roles of fading and transmit diversity in wireless power transfer with nonlinear energy harvesting
, IEEE Transactions on Wireless Communications, Vol: 17, Pages: 7731-7743, ISSN: 1536-1276We study the effect of channel fading in WirelessPower Transfer (WPT) and show that fading enhances the RF-to-DC conversion efficiency of nonlinear RF energy harvesters.We then develop a new form of signal design for WPT, denoted asTransmit Diversity, that relies on multiple dumb antennas at thetransmitter to induce fast fluctuations of the wireless channel.Those fluctuations boost the RF-to-DC conversion efficiencythanks to the energy harvester nonlinearity. In contrast with(energy) beamforming, Transmit Diversity does not rely onChannel State Information at the Transmitter (CSIT) and doesnot increase the average power at the energy harvester input,though it still enhances the overall end-to-end power transferefficiency. Transmit Diversity is also combined with recentlydeveloped (energy) waveform and modulation to provide furtherenhancements. The efficacy of the scheme is analyzed usingphysics-based and curve fitting-based nonlinear models of the en-ergy harvester and demonstrated using circuit simulations, pro-totyping and experimentation. Measurements with two transmitantennas reveal gains of 50% in harvested DC power over a singletransmit antenna setup. The work (again) highlights the crucialrole played by the harvester nonlinearity and demonstrates thatmultiple transmit antennas can be beneficial to WPT even in theabsence of CSIT.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.