Publications

Journal article

Yiallourides C, Naylor PA, 2021,

Time-frequency analysis and parameterisation of knee sounds fornon-invasive setection of osteoarthritis

, IEEE Transactions on Biomedical Engineering, Vol: 68, Pages: 1250-1261, ISSN: 0018-9294

Objective: In this work the potential of non-invasive detection of kneeosteoarthritis is investigated using the sounds generated by the knee jointduring walking. Methods: The information contained in the time-frequency domainof these signals and its compressed representations is exploited and theirdiscriminant properties are studied. Their efficacy for the task of normal vsabnormal signal classification is evaluated using a comprehensive experimentalframework. Based on this, the impact of the feature extraction parameters onthe classification performance is investigated using Classification andRegression Trees (CART), Linear Discriminant Analysis (LDA) and Support VectorMachine (SVM) classifiers. Results: It is shown that classification issuccessful with an area under the Receiver Operating Characteristic (ROC) curveof 0.92. Conclusion: The analysis indicates improvements in classificationperformance when using non-uniform frequency scaling and identifies specificfrequency bands that contain discriminative features. Significance: Contrary toother studies that focus on sit-to-stand movements and knee flexion/extension,this study used knee sounds obtained during walking. The analysis of suchsignals leads to non-invasive detection of knee osteoarthritis with highaccuracy and could potentially extend the range of available tools for theassessment of the disease as a more practical and cost effective method withoutrequiring clinical setups.

Conference paper

Neo VW, Evers C, Naylor PA, 2021,

Speech dereverberation performance of a polynomial-EVD subspace approach

, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465

The degradation of speech arising from additive background noise and reverberation affects the performance of important speech applications such as telecommunications, hearing aids, voice-controlled systems and robot audition. In this work, we focus on dereverberation. It is shown that the parameterized polynomial matrix eigenvalue decomposition (PEVD)-based speech enhancement algorithm exploits the lack of correlation between speech and the late reflections to enhance the speech component associated with the direct path and early reflections. The algorithm's performance is evaluated using simulations involving measured acoustic impulse responses and noise from the ACE corpus. The simulations and informal listening examples have indicated that the PEVD-based algorithm performs dereverberation over a range of SNRs without introducing any noticeable processing artefacts.

Conference paper

McKnight SW, Hogg A, Naylor P, 2020,

Analysis of phonetic dependence of segmentation errors in speaker diarization

, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465

Evaluation of speaker segmentation and diarization normally makes use of forgiveness collars around ground truth speaker segment boundaries such that estimated speaker segment boundaries with such collars are considered completely correct. This paper shows that the popular recent approach of removing forgiveness collars from speaker diarization evaluation tools can unfairly penalize speaker diarization systems that correctly estimate speaker segment boundaries. The uncertainty in identifying the start and/or end of a particular phoneme means that the ground truth segmentation is not perfectly accurate, and even trained human listeners are unable to identify phoneme boundaries with full consistency. This research analyses the phoneme dependence of this uncertainty, and shows that it depends on (i) whether the phoneme being detected is at the start or end of an utterance and (ii) what the phoneme is, so that the use of a uniform forgiveness collar is inadequate. This analysis is expected to point the way towards more indicative and repeatable assessment of the performance of speaker diarization systems.

Journal article

Papayiannis C, Evers C, Naylor P, 2020,

End-to-end classification of reverberant rooms using DNNs

, IEEE Transactions on Audio, Speech and Language Processing, Vol: 28, Pages: 3010-3017, ISSN: 1558-7916

Reverberation is present in our workplaces, ourhomes, concert halls and theatres. This paper investigates howdeep learning can use the effect of reverberation on speechto classify a recording in terms of the room in which it wasrecorded. Existing approaches in the literature rely on domainexpertise to manually select acoustic parameters as inputs toclassifiers. Estimation of these parameters from reverberantspeech is adversely affected by estimation errors, impacting theclassification accuracy. In order to overcome the limitations ofpreviously proposed methods, this paper shows how DNNs canperform the classification by operating directly on reverberantspeech spectra and a CRNN with an attention-mechanism isproposed for the task. The relationship is investigated betweenthe reverberant speech representations learned by the DNNs andacoustic parameters. For evaluation, AIRs are used from theACE-challenge dataset that were measured in 7 real rooms. Theclassification accuracy of the CRNN classifier in the experimentsis 78% when using 5 hours of training data and 90% when using10 hours.

Conference paper

Neo VW, Evers C, Naylor PA, 2020,

PEVD-based speech enhancement in reverberant environments

, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 186-190

The enhancement of noisy speech is important for applications involving human-to-human interactions, such as telecommunications and hearing aids, as well as human-to-machine interactions, such as voice-controlled systems and robot audition. In this work, we focus on reverberant environments. It is shown that, by exploiting the lack of correlation between speech and the late reflections, further noise reduction can be achieved. This is verified using simulations involving actual acoustic impulse responses and noise from the ACE corpus. The simulations show that even without using a noise estimator, our proposed method simultaneously achieves noise reduction, and enhancement of speech quality and intelligibility, in reverberant environments over a wide range of SNRs. Furthermore, informal listening examples highlight that our approach does not introduce any significant processing artefacts such as musical noise.

Journal article

Joudeh H, Clerckx B, 2019,

On the optimality of treating inter-cell interference as noise in uplink cellular networks

, IEEE Transactions on Information Theory, Vol: 65, Pages: 7208-7232, ISSN: 0018-9448

In this paper, we explore the information-theoretic optimality of treating interference as noise (TIN) in cellular networks. We focus on uplink scenarios modeled by the Gaussian interfering multiple access channel (IMAC), comprising K mutually interfering multiple access channels (MACs), each formed by an arbitrary number of transmitters communicating independent messages to one receiver. We define TIN for this setting as a scheme in which each MAC (or cell) performs a power-controlled version of its capacity-achieving strategy, with Gaussian codebooks and successive decoding, while treating interference from all other MACs (i.e. inter-cell interference) as noise. We characterize the generalized degrees-of-freedom (GDoF) region achieved through the proposed TIN scheme, and then identify conditions under which this achievable region is convex without the need for time-sharing. We then tighten these convexity conditions and identify a regime in which the proposed TIN scheme achieves the entire GDoF region of the IMAC and is within a constant gap of the entire capacity region.

Journal article

Kotzagiannidis MS, Dragotti PL, 2019,

Sampling and reconstruction of sparse signals on circulant graphs – an introduction to graph-FRI

, Applied and Computational Harmonic Analysis, Vol: 47, Pages: 539-565, ISSN: 1096-603X

With the objective of employing graphs toward a more generalized theory of signal processing, we present a novel sampling framework for (wavelet-)sparse signals defined on circulant graphs which extends basic properties of Finite Rate of Innovation (FRI) theory to the graph domain, and can be applied to arbitrary graphs via suitable approximation schemes. At its core, the introduced Graph-FRI-framework states that any K-sparse signal on the vertices of a circulant graph can be perfectly reconstructed from its dimensionality-reduced representation in the graph spectral domain, the Graph Fourier Transform (GFT), of minimum size 2K. By leveraging the recently developed theory of e-splines and e-spline wavelets on graphs, one can decompose this graph spectral transformation into the multiresolution low-pass filtering operation with a graph e-spline filter, with subsequent transformation to the spectral graph domain; this allows to infer a distinct sampling pattern, and, ultimately, the structure of an associated coarsened graph, which preserves essential properties of the original, including circularity and, where applicable, the graph generating set.

Conference paper

Neo V, Evers C, Naylor P, 2019,

Speech enhancement using polynomial eigenvalue decomposition

, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Publisher: IEEE

Speech enhancement is important for applications such as telecommunications, hearing aids, automatic speech recognition and voice-controlled system. The enhancement algorithms aim to reduce interfering noise while minimizing any speech distortion. In this work for speech enhancement, we propose to use polynomial matrices in order to exploit the spatial, spectral as well as temporal correlations between the speech signals received by the microphone array. Polynomial matrices provide the necessary mathematical framework in order to exploit constructively the spatial correlations within and between sensor pairs, as well as the spectral-temporal correlations of broadband signals, such as speech. Specifically, the polynomial eigenvalue decomposition (PEVD) decorrelates simultaneously in space, time and frequency. We then propose a PEVD-based speech enhancement algorithm. Simulations and informal listening examples have shown that our method achieves noise reduction without introducing artefacts into the enhanced signal for white, babble and factory noise conditions between -10 dB to 30 dB SNR.

Conference paper

Hogg AOT, Evers C, Naylor PA, 2019,

Multiple Hypothesis Tracking for Overlapping Speaker Segmentation

, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE

Journal article

Kotzagiannidis MS, Dragotti PL, 2019,

Splines and Wavelets on Circulant Graphs

, Applied and Computational Harmonic Analysis, Vol: 47, Pages: 481-515, ISSN: 1096-603X

We present novel families of wavelets and associated filterbanks for the analysis and representation of functions defined on circulant graphs. In this work, we leverage the inherent vanishing moment property of the circulant graph Laplacian operator, and by extension, the e-graph Laplacian, which is established as a parameterization of the former with respect to the degree per node, for the design of vertex-localized and critically-sampled higher-order graph (e-)spline wavelet filterbanks, which can reproduce and annihilate classes of (exponential) polynomial signals on circulant graphs. In addition, we discuss similarities and analogies of the detected properties and resulting constructions with splines and spline wavelets in the Euclidean domain. Ultimately, we consider generalizations to arbitrary graphs in the form of graph approximations, with focus on graph product decompositions. In particular, we proceed to show how the use of graph products facilitates a multi-dimensional extension of the proposed constructions and properties.

Conference paper

Sharma D, Hogg AOT, Wang Y, Nour-Eldin A, Naylor PAet al., 2019,

Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks

, 2019 27th European Signal Processing Conference (EUSIPCO), Publisher: IEEE

Conference paper

Hogg AOT, Evers C, Naylor PA, 2019,

Speaker Change Detection Using Fundamental Frequency with Application to Multi-talker Segmentation

, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE

Conference paper

Neo V, Naylor PA, 2019,

Second order sequential best rotation algorithm with householder reduction for polynomial matrix eigenvalue decomposition

, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, Pages: 8043-8047, ISSN: 0736-7791

The Second-order Sequential Best Rotation (SBR2) algorithm, usedfor Eigenvalue Decomposition (EVD) on para-Hermitian polynomialmatrices typically encountered in wideband signal processingapplications like multichannel Wiener filtering and channel coding,involves a series of delay and rotation operations to achieve diagonalisation.In this paper, we proposed the use of Householder transformationsto reduce polynomial matrices to tridiagonal form beforezeroing the dominant element with rotation. Similar to performingHouseholder reduction on conventional matrices, our methodenables SBR2 to converge in fewer iterations with smaller orderof polynomial matrix factors because more off-diagonal Frobeniusnorm(F-norm) could be transferred to the main diagonal at everyiteration. A reduction in the number of iterations by 12.35% and0.1% improvement in reconstruction error is achievable.

Journal article

Campello A, Dadush D, Ling C, 2019,

AWGN-goodness is enough: capacity-achieving lattice codes based on dithered probabilistic shaping

, IEEE Transactions on Information Theory, Vol: 65, Pages: 1961-1971, ISSN: 0018-9448

In this paper we show that any sequence of infinite lattice constellations which is good for the unconstrained Gaussian channel can be shaped into a capacity-achieving sequence of codes for the power-constrained Gaussian channel under lattice decoding and non-uniform signalling. Unlike previous results in the literature, our scheme holds with no extra condition on the lattices (e.g. quantization-goodness or vanishing flatness factor), thus establishing a direct implication between AWGNgoodness, in the sense of Poltyrev, and capacity-achieving codes. Our analysis uses properties of the discrete Gaussian distribution in order to obtain precise bounds on the probability of error and achievable rates. In particular, we obtain a simple characterization of the finite-blocklength behavior of the scheme, showing that it approaches the optimal dispersion coefficient for high signalto- noise ratio. We further show that for low signal-to-noise ratio the discrete Gaussian over centered lattice constellations cannot achieve capacity, and thus a shift (or “dither”) is essentially necessary.

Journal article

Moore A, Xue W, Naylor P, Brookes Det al., 2019,

Noise covariance matrix estimation for rotating microphone arrays

, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 27, Pages: 519-530, ISSN: 2329-9290

The noise covariance matrix computed between the signals from a microphone array is used in the design of spatial filters and beamformers with applications in noise suppression and dereverberation. This paper specifically addresses the problem of estimating the covariance matrix associated with a noise field when the array is rotating during desired source activity, as is common in head-mounted arrays. We propose a parametric model that leads to an analytical expression for the microphone signal covariance as a function of the array orientation and array manifold. An algorithm for estimating the model parameters during noise-only segments is proposed and the performance shown to be improved, rather than degraded, by array rotation. The stored model parameters can then be used to update the covariance matrix to account for the effects of any array rotation that occurs when the desired source is active. The proposed method is evaluated in terms of the Frobenius norm of the error in the estimated covariance matrix and of the noise reduction performance of a minimum variance distortionless response beamformer. In simulation experiments the proposed method achieves 18 dB lower error in the estimated noise covariance matrix than a conventional recursive averaging approach and results in noise reduction which is within 0.05 dB of an oracle beamformer using the ground truth noise covariance matrix.

Journal article

Campello A, Ling C, Belfiore J-C, 2018,

Universal lattice codes for MIMO channels

, IEEE Transactions on Information Theory, Vol: 64, Pages: 7847-7865, ISSN: 0018-9448

We propose a coding scheme that achieves the capacity of the compound MIMO channel with algebraic lattices. Our lattice construction exploits the multiplicative structure of number fields and their group of units to absorb ill-conditioned channel realizations. To shape the constellation, a discrete Gaussian distribution over the lattice points is applied. These techniques, along with algebraic properties of the proposed lattices, are then used to construct a sub-optimal de-coupled coding scheme that achieves a constant gap to compound capacity by decoding in a lattice that does not dependent on the channel realization. The gap is characterized in terms of algebraic invariants of the code and is shown to be significantly smaller than previous schemes in the literature. We also exhibit alternative algebraic constructions that achieve the capacity of ergodic (SISO) fading channels.

Conference paper

Moore AH, Lightburn L, Xue W, Naylor P, Brookes Det al., 2018,

Binaural mask-informed speech enhancement for hearing aids with head tracking

, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEE, Pages: 461-465

An end-to-end speech enhancement system for hearing aids is pro-posed which seeks to improve the intelligibility of binaural speechin noise during head movement. The system uses a reference beam-former whose look direction is informed by knowledge of the headorientation and the a priori known direction of the desired source.From this a time-frequency mask is estimated using a deep neuralnetwork. The binaural signals are obtained using bilateral beam-formers followed by a classical minimum mean square error speechenhancer, modified to use the estimated mask as a speech presenceprobability prior. In simulated experiments, the improvement in abinaural intelligibility metric (DBSTOI) given by the proposed sys-tem relative to beamforming alone corresponds to an SNR improve-ment of 4 to 6 dB. Results also demonstrate the individual contribu-tions of incorporating the mask and the head orientation-aware beamsteering to the proposed system.

Conference paper

Evers C, Loellmann H, Mellmann H, Schmidt A, Barfuss H, Naylor P, Kellermann Wet al., 2018,

LOCATA challenge - evaluation tasks and measures

, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEE

Sound source localization and tracking algorithms provide estimatesof the positional information about active sound sources in acous-tic environments. Despite substantial advances and significant in-terest in the research community, a comprehensive benchmarkingcampaign of the various approaches using a common database ofaudio recordings has, to date, not been performed. The aim of theIEEE-AASP Challenge on sound source localization and tracking(LOCATA) is to objectively benchmark state-of-the-art localizationand tracking algorithms using an open-access data corpus of record-ings for scenarios typically encountered in audio and acoustic signalprocessing applications. The challenge tasks range from the local-ization of a single source with a static microphone array to trackingof multiple moving sources with a moving microphone array. Thispaper provides an overview of the challenge tasks, describes the per-formance measures used for evaluation of the LOCATA Challenge,and presents baseline results for the development dataset.

Journal article

Clerckx B, Kim J, 2018,

On the beneficial roles of fading and transmit diversity in wireless power transfer with nonlinear energy harvesting

, IEEE Transactions on Wireless Communications, Vol: 17, Pages: 7731-7743, ISSN: 1536-1276

We study the effect of channel fading in WirelessPower Transfer (WPT) and show that fading enhances the RF-to-DC conversion efficiency of nonlinear RF energy harvesters.We then develop a new form of signal design for WPT, denoted asTransmit Diversity, that relies on multiple dumb antennas at thetransmitter to induce fast fluctuations of the wireless channel.Those fluctuations boost the RF-to-DC conversion efficiencythanks to the energy harvester nonlinearity. In contrast with(energy) beamforming, Transmit Diversity does not rely onChannel State Information at the Transmitter (CSIT) and doesnot increase the average power at the energy harvester input,though it still enhances the overall end-to-end power transferefficiency. Transmit Diversity is also combined with recentlydeveloped (energy) waveform and modulation to provide furtherenhancements. The efficacy of the scheme is analyzed usingphysics-based and curve fitting-based nonlinear models of the en-ergy harvester and demonstrated using circuit simulations, pro-totyping and experimentation. Measurements with two transmitantennas reveal gains of 50% in harvested DC power over a singletransmit antenna setup. The work (again) highlights the crucialrole played by the harvester nonlinearity and demonstrates thatmultiple transmit antennas can be beneficial to WPT even in theabsence of CSIT.

Journal article

Dragotti P, Huang J, 2018,

Photo realistic image completion via dense correspondence

, IEEE Transactions on Image Processing, Vol: 27, Pages: 5234-5247, ISSN: 1057-7149

In this paper, we propose an image completion algorithm based on dense correspondence between the input image and an exemplar image retrieved from the Internet. Contrary to traditional methods which register two images according to sparse correspondence, in this paper, we propose a hierarchical PatchMatch method that progressively estimates a dense correspondence, which is able to capture small deformations between images. The estimated dense correspondence has usually large occlusion areas that correspond to the regions to be completed. A nearest neighbor field (NNF) interpolation algorithm interpolates a smooth and accurate NNF over the occluded region. Given the calculated NNF, the correct image content from the exemplar image is transferred to the input image. Finally, as there could be a color difference between the completed content and the input image, a color correction algorithm is applied to remove the visual artifacts. Numerical results show that our proposed image completion method can achieve photo realistic image completion results.

Journal article

Stott AE, Kanna S, Mandic DP, 2018,

Widely linear complex partial least squares for latent subspace regression

, SIGNAL PROCESSING, Vol: 152, Pages: 350-362, ISSN: 0165-1684

Journal article

Luzzi L, Vehkalahti R, Ling C, 2018,

Almost universal codes for MIMO wiretap channels

, IEEE Transactions on Information Theory, Vol: 64, Pages: 7218-7241, ISSN: 0018-9448

Despite several works on secrecy coding for fading and MIMO wiretap channels from an error probability perspective, the construction of information-theoretically secure codes over such channels remains an open problem. In this paper, we consider a fading wiretap channel model where the transmitter has only partial statistical channel state information. Our channel model includes static channels, i.i.d. block fading channels, and ergodic stationary fading with fast decay of large deviations for the eavesdropper's channel. We extend the flatness factor criterion from the Gaussian wiretap channel to fading and MIMO wiretap channels, and establish a simple design criterion where the normalized product distance/minimum determinant of the lattice and its dual should be maximized simultaneously. Moreover, we propose concrete lattice codes satisfying this design criterion, which are built from algebraic number fields with constant root discriminant in the single-antenna case, and from division algebras centered at such number fields in the multipleantenna case. The proposed lattice codes achieve strong secrecy and semantic security for all rates R <; C b - C e - κ, where C b and C e are Bob and Eve's channel capacities, respectively, and κ is an explicit constant gap. Furthermore, these codes are almost universal in the sense that a fixed code is good for secrecy for a wide range of fading models. Finally, we consider a compound wiretap model with a more restricted uncertainty set, and show that rates R <; C̅ b - C̅ e - κ are achievable, where C̅ b is a lower bound for Bob's capacity and C̅ e is an upper bound for Eve's capacity for all

Conference paper

Leung KK, Wang S, Tuor T, Salonidis T, Makaya C, He T, Chan Ket al., 2018,

When edge meets learning: adaptive control for resource-constrained distributed machine learning

, IEEE Infocom 2018, Publisher: IEEE

Emerging technologies and applications includingInternet of Things (IoT), social networking, and crowd-sourcinggenerate large amounts of data at the network edge. Machinelearning models are often built from the collected data, to enablethe detection, classification, and prediction of future events.Due to bandwidth, storage, and privacy concerns, it is oftenimpractical to send all the data to a centralized location. In thispaper, we consider the problem of learning model parametersfrom data distributed across multiple edge nodes, without sendingraw data to a centralized place. Our focus is on a generic classof machine learning models that are trained using gradient-descent based approaches. We analyze the convergence rate ofdistributed gradient descent from a theoretical point of view,based on which we propose a control algorithm that determinesthe best trade-off between local update and global parameteraggregation to minimize the loss function under a given resourcebudget. The performance of the proposed algorithm is evaluatedvia extensive experiments with real datasets, both on a networkedprototype system and in a larger-scale simulated environment.The experimentation results show that our proposed approachperforms near to the optimum with various machine learningmodels and different data distributions.

Journal article

Oliveira V, Martins R, Liow N, Teiserskas J, von Rosenberg W, Adjei T, Shivamurthappa V, Lally PJ, Mandic D, Thayyil Set al., 2018,

Prognostic accuracy of heart rate variability analysis in neonatal encephalopathy: a systematic review

, Neonatology, Vol: 115, Pages: 59-67, ISSN: 1661-7800

BACKGROUND: Heart rate variability analysis offers real-time quantification of autonomic disturbance after perinatal asphyxia, and may therefore aid in disease stratification and prognostication after neonatal encephalopathy (NE). OBJECTIVE: To systematically review the existing literature on the accuracy of early heart rate variability (HRV) to predict brain injury and adverse neurodevelopmental outcomes after NE. DESIGN/METHODS: We systematically searched the literature published between May 1947 and May 2018. We included all prospective and retrospective studies reporting HRV metrics, within the first 7 days of life in babies with NE, and its association with adverse outcomes (defined as evidence of brain injury on magnetic resonance imaging and/or abnormal neurodevelopment at ≥1 year of age). We extracted raw data wherever possible to calculate the prognostic indices with confidence intervals. RESULTS: We retrieved 379 citations, 5 of which met the criteria. One further study was excluded as it analysed an already-included cohort. The 4 studies provided data on 205 babies, 80 (39%) of whom had adverse outcomes. Prognostic accuracy was reported for 12 different HRV metrics and the area under the curve (AUC) varied between 0.79 and 0.94. The best performing metric reported in the included studies was the relative power of high-frequency band, with an AUC of 0.94. CONCLUSIONS: HRV metrics are a promising bedside tool for early prediction of brain injury and neurodevelopmental outcome in babies with NE. Due to the small number of studies available, their heterogeneity and methodological limitations, further research is needed to refine this tool so that it can be used in clinical practice.

Journal article

Liu T, Stathaki T, 2018,

Faster R-CNN for Robust Pedestrian Detection using Semantic Segmentation Network

, Frontiers in Neurorobotics

Cite

Journal article

Xue W, Moore A, Brookes DM, Naylor Pet al., 2018,

Modulation-domain multichannel Kalman filtering for speech enhancement

, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 1833-1847, ISSN: 2329-9290

Compared with single-channel speech enhancement methods, multichannel methods can utilize spatial information to design optimal filters. Although some filters adaptively consider second-order signal statistics, the temporal evolution of the speech spectrum is usually neglected. By using linear prediction (LP) to model the inter-frame temporal evolution of speech, single-channel Kalman filtering (KF) based methods have been developed for speech enhancement. In this paper, we derive a multichannel KF (MKF) that jointly uses both interchannel spatial correlation and interframe temporal correlation for speech enhancement. We perform LP in the modulation domain, and by incorporating the spatial information, derive an optimal MKF gain in the short-time Fourier transform domain. We show that the proposed MKF reduces to the conventional multichannel Wiener filter if the LP information is discarded. Furthermore, we show that, under an appropriate assumption, the MKF is equivalent to a concatenation of the minimum variance distortion response beamformer and a single-channel modulation-domain KF and therefore present an alternative implementation of the MKF. Experiments conducted on a public head-related impulse response database demonstrate the effectiveness of the proposed method.

Journal article

Reynolds SC, abrahamsson T, sjostrom PJ, Schultz S, Dragotti PLet al., 2018,

CosMIC: a consistent metric for spike inference from calcium imaging

, Neural Computation, Vol: 30, Pages: 2726-2756, ISSN: 0899-7667

In recent years, the development of algorithms to detect neuronal spiking activity from two-photon calcium imaging data has received much attention. Meanwhile, few researchers have examined the metrics used to assess the similarity of detected spike trains with the ground truth. We highlight the limitations of the two most commonly used metrics, the spike train correlation and success rate, and propose an alternative, which we refer to as CosMIC. Rather than operating on the true and estimated spike trains directly, the proposed metric assesses the similarity of the pulse trains obtained from convolution of the spike trains with a smoothing pulse. The pulse width, which is derived from the statistics of the imaging data, reflects the temporal tolerance of the metric. The final metric score is the size of the commonalities of the pulse trains as a fraction of their average size. Viewed through the lens of set theory, CosMIC resembles a continuous Sørensen-Dice coefficient — an index commonly used to assess the similarity of discrete, presence/absence data. We demonstrate the ability of the proposed metric to discriminate the precision and recall of spike train estimates. Unlike the spike train correlation, which appears to reward overestimation, the proposed metric score is maximised when the correct number of spikes have been detected. Furthermore, we show that CosMIC is more sensitive to the temporal precision of estimates than the success rate.

Conference paper

Moore AH, Naylor P, Brookes DM, 2018,

Room identification using frequency dependence of spectral decay statistics

, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers Inc., Pages: 6902-6906, ISSN: 0736-7791

A method for room identification is proposed based on the reverberation properties of multichannel speech recordings. The approach exploits the dependence of spectral decay statistics on the reverberation time of a room. The average negative-side variance within 1/3-octave bands is proposed as the identifying feature and shown to be effective in a classification experiment. However, negative-side variance is also dependent on the direct-to-reverberant energy ratio. The resulting sensitivity to different spatial configurations of source and microphones within a room are mitigated using a novel reverberation enhancement algorithm. A classification experiment using speech convolved with measured impulse responses and contaminated with environmental noise demonstrates the effectiveness of the proposed method, achieving 79% correct identification in the most demanding condition compared to 40% using unenhanced signals.

Conference paper

Xue W, Moore A, Brookes DM, Naylor Pet al., 2018,

Multichannel kalman filtering for speech ehnancement

, IEEE Intl Conf on Acoustics, Speech and Signal Processing, Publisher: IEEE, ISSN: 2379-190X

The use of spatial information in multichannel speech enhancement methods is well established but information associated with the temporal evolution of speech is less commonly exploited. Speech signals can be modelled using an autoregressive process in the time-frequency modulation domain, and Kalman filtering based speech enhancement algorithms have been developed for single-channel processing. In this paper, a multichannel Kalman filter (MKF) for speech enhancement is derived that jointly considers the multichannel spatial information and the temporal correlations of speech. We model the temporal evolution of speech in the modulation domain and, by incorporating the spatial information, an optimal MKF gain is derived in the short-time Fourier transform domain. We also show that the proposed MKF becomes a conventional multichannel Wiener filter if the temporal information is discarded. Experiments using the signals generated from a public head-related impulse response database demonstrate the effectiveness of the proposed method in comparison to other techniques.

Conference paper

Antonello N, De Sena E, Moonen M, Naylor PA, van Waterschoot Met al., 2018,

JOINT SOURCE LOCALIZATION AND DEREVERBERATION BY SOUND FIELD INTERPOLATION USING SPARSE REGULARIZATION

, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 6892-6896

In this paper, source localization and dereverberation are formulated jointly as an inverse problem. The inverse problem consists in the interpolation of the sound field measured by a set of microphones by matching the recorded sound pressure with that of a particular acoustic model. This model is based on a collection of equivalent sources creating either spherical or plane waves. In order to achieve meaningful results, spatial, spatio-temporal and spatio-spectral sparsity can be promoted in the signals originating from the equivalent sources. The inverse problem consists of a large-scale optimization problem that is solved using a first order matrix-free optimization algorithm. It is shown that once the equivalent source signals capable of effectively interpolating the sound field are obtained, they can be readily used to localize a speech sound source in terms of Direction of Arrival (DOA) and to perform dereverberation in a highly reverberant environment.

Imperial College London

Latest News

Department of Electrical and Electronic Engineering

Time-frequency analysis and parameterisation of knee sounds fornon-invasive setection of osteoarthritis

Speech dereverberation performance of a polynomial-EVD subspace approach

Analysis of phonetic dependence of segmentation errors in speaker diarization

End-to-end classification of reverberant rooms using DNNs

PEVD-based speech enhancement in reverberant environments

On the optimality of treating inter-cell interference as noise in uplink cellular networks

Sampling and reconstruction of sparse signals on circulant graphs – an introduction to graph-FRI

Speech enhancement using polynomial eigenvalue decomposition

Multiple Hypothesis Tracking for Overlapping Speaker Segmentation

Splines and Wavelets on Circulant Graphs

Non-Intrusive POLQA Estimation of Speech Quality using Recurrent Neural Networks

Speaker Change Detection Using Fundamental Frequency with Application to Multi-talker Segmentation

Second order sequential best rotation algorithm with householder reduction for polynomial matrix eigenvalue decomposition

AWGN-goodness is enough: capacity-achieving lattice codes based on dithered probabilistic shaping

Noise covariance matrix estimation for rotating microphone arrays

Universal lattice codes for MIMO channels

Binaural mask-informed speech enhancement for hearing aids with head tracking

LOCATA challenge - evaluation tasks and measures

On the beneficial roles of fading and transmit diversity in wireless power transfer with nonlinear energy harvesting

Photo realistic image completion via dense correspondence

Widely linear complex partial least squares for latent subspace regression

Almost universal codes for MIMO wiretap channels

When edge meets learning: adaptive control for resource-constrained distributed machine learning

Prognostic accuracy of heart rate variability analysis in neonatal encephalopathy: a systematic review

Faster R-CNN for Robust Pedestrian Detection using Semantic Segmentation Network

Modulation-domain multichannel Kalman filtering for speech enhancement

CosMIC: a consistent metric for spike inference from calcium imaging

Room identification using frequency dependence of spectral decay statistics

Multichannel kalman filtering for speech ehnancement

JOINT SOURCE LOCALIZATION AND DEREVERBERATION BY SOUND FIELD INTERPOLATION USING SPARSE REGULARIZATION

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results