Imperial College London

Dr Patrick A. Naylor

Faculty of EngineeringDepartment of Electrical and Electronic Engineering

Professor of Speech & Acoustic Signal Processing
 
 
 
//

Contact

 

+44 (0)20 7594 6235p.naylor Website

 
 
//

Location

 

803Electrical EngineeringSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

328 results found

Antonello N, De Sena E, Moonen M, Naylor PA, van Waterschoot Tet al., 2019, Joint Acoustic Localization and Dereverberation Through Plane Wave Decomposition and Sparse Regularization, IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, Vol: 27, Pages: 1893-1905, ISSN: 2329-9290

Journal article

Sharma D, Hogg A, Wang Y, Nour-Eldin A, Naylor Pet al., Non-Intrusive POLQA estimation of speech quality using recurrent neural networks, European Signal Processing Conference (EUSIPCO), Publisher: IEEE

Estimating the quality of speech without the use of a clean reference signal is a challenging problem, in part due to the time and expense required to collect sufficient training data for modern machine learning algorithms. We present a novel, non-intrusive estimator that exploits recurrent neural network architectures to predict the intrusive POLQA score of a speech signal in a short time context. The predictor is based on a novel compressed representation of modulation domain features, used in conjunction with static MFCC features. We show that the proposed method can reliably predict POLQA with a 300 ms context, achieving a mean absolute error of 0.21 on unseen data.The proposed method is trained using English speech and is shown to generalize well across unseen languages. The neural network also jointly estimates the mean voice activity detection(VAD) with an F1 accuracy score of 0.9, removing the need for an external VAD.

Conference paper

Neo V, Evers C, Naylor P, Speech enhancement using polynomial eigenvalue decomposition, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, Publisher: IEEE

Speech enhancement is important for applications such as telecommunications, hearing aids, automatic speech recognition and voice-controlled system. The enhancement algorithms aim to reduce interfering noise while minimizing any speech distortion. In this work for speech enhancement, we propose to use polynomial matrices in order to exploit the spatial, spectral as well as temporal correlations between the speech signals received by the microphone array. Polynomial matrices provide the necessary mathematical framework in order to exploit constructively​ the spatial correlations within and between sensor pairs, as well as the spectral-temporal correlations of broadband signals, such as speech. Specifically, the polynomial eigenvalue decomposition (PEVD) decorrelates simultaneously in space, time and frequency. We then propose a PEVD-based speech enhancement algorithm. Simulations and informal listening examples have shown that our method achieves noise reduction without introducing artefacts into the enhanced signal for white, babble and factory noise conditions between -10 dB to 30 dB SNR.

Conference paper

Hogg A, Evers C, Naylor P, Multiple hypothesis tracking for overlapping speaker segmentation, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Publisher: IEEE

Speaker segmentation is an essential part of any diarization system.Applications of diarization include tasks such as speaker indexing, improving automatic speech recognition (ASR) performance and making single speaker-based algorithms available for use in multi-speaker environments.This paper proposes a multiple hypothesis tracking (MHT) method that exploits the harmonic structure associated with the pitch in voiced speech in order to segment the onsets and end-points of speech from multiple, overlapping speakers. The proposed method is evaluated against a segmentation system from the literature that uses a spectral representation and is based on employing bidirectional long short term memory networks (BLSTM). The proposed method is shown to achieve comparable performance for segmenting overlapping speakers only using the pitch harmonic information in the MHT framework.

Conference paper

Hogg A, Naylor P, Evers C, 2019, Speaker change detection using fundamental frequency with application to multi-talker segmentation, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE

This paper shows that time varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. First a study is conducted to verify that changes in pitch are strong indicators of changes in the speaker. It is then highlighted that an individual’s pitch is smoothly varying and, therefore, can be predicted by means of a Kalman filter. Subsequently it is shown that if the pitch is not predictable then this is most likely due to a change in the speaker. Finally, a novel system is proposed that uses this approach of pitch prediction for speaker change detection. This system is then evaluated against a commonly used MFCC segmentation system. The proposed system is shown to increase the speaker change detection rate from 43.3% to 70.5% on meetings in the AMI corpus. Therefore, there are two equally weighted contributions in this paper: 1. We address the question of whether a change in pitch is a reliable estimator of a speaker change in multi-talk meeting audio. 2. We develop a method to extract such speaker changes and test them on a widely available meeting corpus.

Conference paper

Neo V, Naylor PA, Second order sequential best rotation algorithm with householder reduction for polynomial matrix eigenvalue decomposition, IEEE International Conference on Acoustics, Speech and Signal Processing, Publisher: IEEE, ISSN: 0736-7791

The Second-order Sequential Best Rotation (SBR2) algorithm, usedfor Eigenvalue Decomposition (EVD) on para-Hermitian polynomialmatrices typically encountered in wideband signal processingapplications like multichannel Wiener filtering and channel coding,involves a series of delay and rotation operations to achieve diagonalisation.In this paper, we proposed the use of Householder transformationsto reduce polynomial matrices to tridiagonal form beforezeroing the dominant element with rotation. Similar to performingHouseholder reduction on conventional matrices, our methodenables SBR2 to converge in fewer iterations with smaller orderof polynomial matrix factors because more off-diagonal Frobeniusnorm(F-norm) could be transferred to the main diagonal at everyiteration. A reduction in the number of iterations by 12.35% and0.1% improvement in reconstruction error is achievable.

Conference paper

Moore AH, de Haan JM, Pedersen MS, Brookes D, Naylor PA, Jensen Jet al., 2019, Personalized signal-independent beamforming for binaural hearing aids, Journal of the Acoustical Society of America, Vol: 145, Pages: 2971-2981, ISSN: 0001-4966

The effect of personalized microphone array calibration on the performance of hearing aid beamformers under noisy reverberant conditions is studied. The study makes use of a new, publicly available, database containing acoustic transfer function measurements from 29 loudspeakers arranged on a sphere to a pair of behind-the-ear hearing aids in a listening room when worn by 27 males, 14 females, and 4 mannequins. Bilateral and binaural beamformers are designed using each participant's hearing aid head-related impulse responses (HAHRIRs). The performance of these personalized beamformers is compared to that of mismatched beamformers, where the HAHRIR used for the design does not belong to the individual for whom performance is measured. The case where the mismatched HAHRIR is that of a mannequin is of particular interest since it represents current practice in commercially available hearing aids. The benefit of personalized beamforming is assessed using an intrusive binaural speech intelligibility metric and in a matrix speech intelligibility test. For binaural beamforming, both measures demonstrate a statistically signficant (p < 0.05) benefit of personalization. The benefit varies substantially between individuals with some predicted to benefit by as much as 1.5 dB.

Journal article

Moore A, Xue W, Naylor P, Brookes Det al., 2019, Noise covariance matrix estimation for rotating microphone arrays, IEEE Transactions on Audio, Speech and Language Processing, Vol: 27, Pages: 519-530, ISSN: 1558-7916

The noise covariance matrix computed between the signals from a microphone array is used in the design of spatial filters and beamformers with applications in noise suppression and dereverberation. This paper specifically addresses the problem of estimating the covariance matrix associated with a noise field when the array is rotating during desired source activity, as is common in head-mounted arrays. We propose a parametric model that leads to an analytical expression for the microphone signal covariance as a function of the array orientation and array manifold. An algorithm for estimating the model parameters during noise-only segments is proposed and the performance shown to be improved, rather than degraded, by array rotation. The stored model parameters can then be used to update the covariance matrix to account for the effects of any array rotation that occurs when the desired source is active. The proposed method is evaluated in terms of the Frobenius norm of the error in the estimated covariance matrix and of the noise reduction performance of a minimum variance distortionless response beamformer. In simulation experiments the proposed method achieves 18 dB lower error in the estimated noise covariance matrix than a conventional recursive averaging approach and results in noise reduction which is within 0.05 dB of an oracle beamformer using the ground truth noise covariance matrix.

Journal article

Gannot S, Naylor PA, 2019, Highlights from the Audio and Acoustic Signal Processing Technical Committee [In the Spotlight], IEEE Signal Processing Magazine, Vol: 36, ISSN: 1053-5888

© 1991-2012 IEEE. The IEEE Audio and Acoustic Signal Processing Technical Committee (AASP TC) is one of 13 TCs in the IEEE Signal Processing Society. Its mission is to support, nourish, and lead scientific and technological development in all areas of AASP. These areas are currently seeing increased levels of interest and significant growth, providing a fertile ground for a broad range of specific and interdisciplinary research and development. Ranging from array processing for microphones and loudspeakers to music genre classification, from psychoacoustics to machine learning (ML), from consumer electronics devices to blue-sky research, this scope encompasses countless technical challenges and many hot topics. The TC has roughly 30 elected volunteer members drawn equally from leading academic and industrial organizations around the world, unified by the common aim of offering their expertise in the service of the scientific community.

Journal article

Brookes D, Lightburn L, Moore A, Naylor P, Xue Wet al., 2019, Mask-assisted speech enhancement for binaural hearing aids, ELOBES2019

Conference paper

Moore A, de Haan JM, Pedersen MS, Naylor P, Brookes D, Jensen Jet al., Personalized {HRTF}s for hearing aids, ELOBES2019

Conference paper

Xue W, Moore AH, Brookes M, Naylor PAet al., 2018, Modulation-domain parametric multichannel kalman filtering for speech enhancement, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 2509-2513, ISSN: 2076-1465

The goal of speech enhancement is to reduce the noise signal while keeping the speech signal undistorted. Recently we developed the multichannel Kalman filtering (MKF) for speech enhancement, in which the temporal evolution of the speech signal and the spatial correlation between multichannel observations are jointly exploited to estimate the clean signal. In this paper, we extend the previous work to derive a parametric MKF (PMKF), which incorporates a controlling factor to achieve the trade-off between the speech distortion and noise reduction. The controlling factor weights between the speech distortion and noise reduction related terms in the cost function of PMKF, and based on the minimum mean squared error (MMSE) criterion, the optimal PMKF gain is derived. We analyse the performance of the proposed PMKF and show the differences with the speech distortion weighted multichannel Wiener filter (SDW-MWF). We conduct experiments in different noisy conditions to evaluate the impact of the controlling factor on the noise reduction performance, and the results demonstrate the effectiveness of the proposed method.

Conference paper

Sharma D, Nour-Eldin A, Harding P, Karimian-Azari S, Naylor PAet al., 2018, ROBUST FEATURE EXTRACTION FROM AD-HOC MICROPHONES FOR MEETING DIARIZATION, 16th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE, Pages: 296-300, ISSN: 2639-4316

Conference paper

Moore AH, Xue W, Naylor PA, Brookes Met al., 2018, Estimation of the Noise Covariance Matrix for Rotating Sensor Arrays, 52nd Asilomar Conference on Signals, Systems, and Computers, Publisher: IEEE, Pages: 1936-1941, ISSN: 1058-6393

Conference paper

Xue W, Moore A, Brookes DM, Naylor Pet al., 2018, Modulation-domain multichannel Kalman filtering for speech enhancement, IEEE/ACM Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 1833-1847, ISSN: 2329-9290

Compared with single-channel speech enhancement methods, multichannel methods can utilize spatial information to design optimal filters. Although some filters adaptively consider second-order signal statistics, the temporal evolution of the speech spectrum is usually neglected. By using linear prediction (LP) to model the inter-frame temporal evolution of speech, single-channel Kalman filtering (KF) based methods have been developed for speech enhancement. In this paper, we derive a multichannel KF (MKF) that jointly uses both interchannel spatial correlation and interframe temporal correlation for speech enhancement. We perform LP in the modulation domain, and by incorporating the spatial information, derive an optimal MKF gain in the short-time Fourier transform domain. We show that the proposed MKF reduces to the conventional multichannel Wiener filter if the LP information is discarded. Furthermore, we show that, under an appropriate assumption, the MKF is equivalent to a concatenation of the minimum variance distortion response beamformer and a single-channel modulation-domain KF and therefore present an alternative implementation of the MKF. Experiments conducted on a public head-related impulse response database demonstrate the effectiveness of the proposed method.

Journal article

Xue W, Moore A, Brookes DM, Naylor Pet al., 2018, Multichannel kalman filtering for speech ehnancement, IEEE Intl Conf on Acoustics, Speech and Signal Processing, Publisher: IEEE, ISSN: 2379-190X

The use of spatial information in multichannel speech enhancement methods is well established but information associated with the temporal evolution of speech is less commonly exploited. Speech signals can be modelled using an autoregressive process in the time-frequency modulation domain, and Kalman filtering based speech enhancement algorithms have been developed for single-channel processing. In this paper, a multichannel Kalman filter (MKF) for speech enhancement is derived that jointly considers the multichannel spatial information and the temporal correlations of speech. We model the temporal evolution of speech in the modulation domain and, by incorporating the spatial information, an optimal MKF gain is derived in the short-time Fourier transform domain. We also show that the proposed MKF becomes a conventional multichannel Wiener filter if the temporal information is discarded. Experiments using the signals generated from a public head-related impulse response database demonstrate the effectiveness of the proposed method in comparison to other techniques.

Conference paper

Yiallourides C, Moore AH, Auvinet E, Van der Straeten C, Naylor PAet al., 2018, Acoustic Analysis and Assessment of the Knee in Osteoarthritis During Walking, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 281-285

We examine the relation between the sounds emitted by the knee joint during walking and its condition, with particular focus on osteoarthritis, and investigate their potential for noninvasive detection of knee pathology. We present a comparative analysis of several features and evaluate their discriminant power for the task of normal-abnormal signal classification. We statistically evaluate the feature distributions using the two-sample Kolmogorov-Smirnov test and the Bhattacharyya distance. We propose the use of 11 statistics to describe the distributions and test with several classifiers. In our experiments with 249 normal and 297 abnormal acoustic signals from 40 knees, a Support Vector Machine with linear kernel gave the best results with an error rate of 13.9%.

Conference paper

Antonello N, De Sena E, Moonen M, Naylor PA, van Waterschoot Met al., 2018, JOINT SOURCE LOCALIZATION AND DEREVERBERATION BY SOUND FIELD INTERPOLATION USING SPARSE REGULARIZATION, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 6892-6896

In this paper, source localization and dereverberation are formulated jointly as an inverse problem. The inverse problem consists in the interpolation of the sound field measured by a set of microphones by matching the recorded sound pressure with that of a particular acoustic model. This model is based on a collection of equivalent sources creating either spherical or plane waves. In order to achieve meaningful results, spatial, spatio-temporal and spatio-spectral sparsity can be promoted in the signals originating from the equivalent sources. The inverse problem consists of a large-scale optimization problem that is solved using a first order matrix-free optimization algorithm. It is shown that once the equivalent source signals capable of effectively interpolating the sound field are obtained, they can be readily used to localize a speech sound source in terms of Direction of Arrival (DOA) and to perform dereverberation in a highly reverberant environment.

Conference paper

Evers C, Naylor PA, 2018, Acoustic SLAM, IEEE Transactions on Audio, Speech and Language Processing, Vol: 26, Pages: 1484-1498, ISSN: 1558-7916

An algorithm is presented that enables devices equipped with microphones, such as robots, to move within their environment in order to explore, adapt to and interact with sound sources of interest. Acoustic scene mapping creates a 3D representation of the positional information of sound sources across time and space. In practice, positional source information is only provided by Direction-of-Arrival (DoA) estimates of the source directions; the source-sensor range is typically difficult to obtain. DoA estimates are also adversely affected by reverberation, noise, and interference, leading to errors in source location estimation and consequent false DoA estimates. Moroever, many acoustic sources, such as human talkers, are not continuously active, such that periods of inactivity lead to missing DoA estimates. Withal, the DoA estimates are specified relative to the observer's sensor location and orientation. Accurate positional information about the observer therefore is crucial. This paper proposes Acoustic Simultaneous Localization and Mapping (aSLAM), which uses acoustic signals to simultaneously map the 3D positions of multiple sound sources whilst passively localizing the observer within the scene map. The performance of aSLAM is analyzed and evaluated using a series of realistic simulations. Results are presented to show the impact of the observer motion and sound source localization accuracy.

Journal article

Evers C, Habets EAP, Gannot S, Naylor PAet al., 2018, DoA reliability for distributed acoustic tracking, IEEE Signal Processing Letters, Vol: 25, Pages: 1320-1324, ISSN: 1070-9908

Distributed acoustic tracking estimates the trajectories of source positions using an acoustic sensor network. As it is often difficult to estimate the source-sensor range from individual nodes, the source positions have to be inferred from Direction of Arrival (DoA) estimates. Due to reverberation and noise, the sound field becomes increasingly diffuse with increasing source-sensor distance, leading to decreased DoA estimation accuracy. To distinguish between accurate and uncertain DoA estimates, this paper proposes to incorporate the Coherent-to-Diffuse Ratio as a measure of DoA reliability for single-source tracking. It is shown that the source positions therefore can be probabilistically triangulated by exploiting the spatial diversity of all nodes.

Journal article

Dawson PJ, De Sena E, Naylor PA, 2018, An acoustic image-source characterisation of surface profiles, 2018 26th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 2130-2134, ISSN: 2076-1465

The image-source method models the specular reflection from a plane by means of a secondary source positioned at the source's reflected image. The method has been widely used in acoustics to model the reverberant field of rectangular rooms, but can also be used for general-shaped rooms and non-flat reflectors. This paper explores the relationship between the physical properties of a non-flat reflector and the statistical properties of the associated cloud of image-sources. It is shown here that the standard deviation of the image-sources is strongly correlated with the ratio between depth and width of the reflector's spatial features.

Conference paper

Hafezi S, Moore AH, Naylor PA, 2018, ROBUST SOURCE COUNTING AND ACOUSTIC DOA ESTIMATION USING DENSITY-BASED CLUSTERING, 10th IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Publisher: IEEE, Pages: 395-399, ISSN: 1551-2282

Conference paper

Moore AH, Lightburn L, Xue W, Naylor P, Brookes Det al., Binaural mask-informed speech enhancement for hearing aids with head tracking, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEE

An end-to-end speech enhancement system for hearing aids is pro-posed which seeks to improve the intelligibility of binaural speechin noise during head movement. The system uses a reference beam-former whose look direction is informed by knowledge of the headorientation and the a priori known direction of the desired source.From this a time-frequency mask is estimated using a deep neuralnetwork. The binaural signals are obtained using bilateral beam-formers followed by a classical minimum mean square error speechenhancer, modified to use the estimated mask as a speech presenceprobability prior. In simulated experiments, the improvement in abinaural intelligibility metric (DBSTOI) given by the proposed sys-tem relative to beamforming alone corresponds to an SNR improve-ment of 4 to 6 dB. Results also demonstrate the individual contribu-tions of incorporating the mask and the head orientation-aware beamsteering to the proposed system.

Conference paper

Evers C, Loellmann H, Mellmann H, Schmidt A, Barfuss H, Naylor P, Kellermann Wet al., LOCATA Challenge - Evaluation Tasks and Measures, International Workshop on Acoustic Signal Enhancement (IWAENC 2018), Publisher: IEEE

Sound source localization and tracking algorithms provide estimatesof the positional information about active sound sources in acous-tic environments. Despite substantial advances and significant in-terest in the research community, a comprehensive benchmarkingcampaign of the various approaches using a common database ofaudio recordings has, to date, not been performed. The aim of theIEEE-AASP Challenge on sound source localization and tracking(LOCATA) is to objectively benchmark state-of-the-art localizationand tracking algorithms using an open-access data corpus of record-ings for scenarios typically encountered in audio and acoustic signalprocessing applications. The challenge tasks range from the local-ization of a single source with a static microphone array to trackingof multiple moving sources with a moving microphone array. Thispaper provides an overview of the challenge tasks, describes the per-formance measures used for evaluation of the LOCATA Challenge,and presents baseline results for the development dataset.

Conference paper

Löllmann HW, Evers C, Schmidt A, Mellmann H, Barfuss H, Naylor PA, Kellermann Wet al., The LOCATA challenge data corpus for acoustic source localization and tracking, IEEE Sensor Array and Multichannel Signal Processing Workshop 2018, Publisher: IEEE, ISSN: 2151-870X

Algorithms for acoustic source localization andtracking are essential for a wide range of applications suchas personal assistants, smart homes, tele-conferencing systems,hearing aids, or autonomous systems. Numerous algorithms havebeen proposed for this purpose which, however, are not evaluatedand compared against each other by using a common database sofar. The IEEE-AASP Challenge on sound source localization andtracking (LOCATA) provides a novel, comprehensive data corpusfor the objective benchmarking of state-of-the-art algorithmson sound source localization and tracking. The data corpuscomprises six tasks ranging from the localization of a singlestatic sound source with a static microphone array to the trackingof multiple moving speakers with a moving microphone array.It contains real-world multichannel audio recordings, obtainedby hearing aids, microphones integrated in a robot head, aplanar and a spherical microphone array in an enclosed acousticenvironment as well as positional information about the involvedarrays and sound sources represented by moving human talkersor static loudspeakers.

Conference paper

Moore AH, Naylor P, Brookes DM, Room identification using frequency dependence of spectral decay statistics, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers Inc., ISSN: 0736-7791

A method for room identification is proposed based on the reverber-ation properties of multichannel speech recordings. The approachexploits the dependence of spectral decay statistics on the reverber-ation time of a room. The average negative-side variance within1/3-octave bands is proposed as the identifying feature and shown to beeffective in a classification experiment. However, negative-side vari-ance is also dependent on the direct-to-reverberant energy ratio. Theresulting sensitivity to different spatial configurations of source andmicrophones within a room are mitigated using a novel reverberationenhancement algorithm. A classification experiment using speechconvolved with measured impulse responses and contaminated withenvironmental noise demonstrates the effectiveness of the proposedmethod, achieving 79% correct identification in the most demandingcondition compared to 40% using unenhanced signals.

Conference paper

De Sena E, Brookes DM, Naylor PA, van Waterschoot Tet al., 2017, Localization Experiments with Reporting by Head Orientation: Statistical Framework and Case Study, Journal of the Audio Engineering Society, Vol: 65, Pages: 982-996, ISSN: 0004-7554

This research focuses on sound localization experiments in which subjects report the position of an active sound source by turning toward it. A statistical framework for the analysis of the data is presented together with a case study from a large-scale listening experiment. The statistical framework is based on a model that is robust to the presence of front/back confusions and random errors. Closed-form natural estimators are derived, and one-sample and two-sample statistical tests are described. The framework is used to analyze the data of an auralized experiment undertaken by nearly nine hundred subjects. The objective was to explore localization performance in the horizontal plane in an informal setting and with little training, which are conditions that are similar to those typically encountered in consumer applications of binaural audio. Results show that responses had a rightward bias and that speech was harder to localize than percussion sounds, which are results consistent with the literature. Results also show that it was harder to localize sound in a simulated room with a high ceiling despite having a higher direct-to-reverberant ratio than other simulated rooms.

Journal article

Weiss S, Goddard NJ, Somasundaram S, Proudler IK, Naylor PAet al., 2017, Identification of Broadband Source-Array Responses from Sensor Second Order Statistics, Sensor Signal Processing for Defence Conference (SSPD), Publisher: IEEE, Pages: 35-39

This paper addresses the identification of source-sensor transfer functions from the measured space-time covariance matrix in the absence of any further side information about the source or the propagation environment. Using polynomial matrix decomposition techniques, the responses can be narrowed down to an indeterminacy of a common polynomial factor. If at least two different measurements for a source with constant power spectral density are available, this indeterminacy can be reduced to an ambiguity in the phase response of the source-sensor paths.

Conference paper

Evers C, naylor PA, 2017, Optimized Self-Localization for SLAM in Dynamic Scenes using Probability Hypothesis Density Filters, IEEE Transactions on Signal Processing, Vol: 66, Pages: 863-878, ISSN: 1053-587X

In many applications, sensors that map the positions of objects in unknown environments are installed on dynamic platforms. As measurements are relative to the observer's sensors, scene mapping requires accurate knowledge of the observer state. However, in practice, observer reports are subject to positioning errors. Simultaneous Localization and Mapping (SLAM) addresses the joint estimation problem of observer localization and scene mapping. State-of-the-art approaches typically use visual or optical sensors and therefore rely on static beacons in the environment to anchor the observer estimate. However, many applications involving sensors that are not conventionally used for SLAM are affected by highly dynamic scenes, such that the static world assumption is invalid. This paper proposes a novel approach for dynamic scenes, called GEneralized Motion (GEM)-SLAM. Based on Probability Hypothesis Density (PHD) filters, the proposed approach probabilistically anchors the observer state by fusing observer information inferred from the scene with reports of the observer motion. This paper derives the general, theoretical framework for GEM-SLAM and shows that it generalizes existing PHD-based SLAM algorithms. Simulations for a model-specific realization using range-bearing sensors and multiple moving objects highlight that GEM-SLAM achieves significant improvements over three benchmark algorithms.

Journal article

Papayiannis C, Evers C, Naylor PA, 2017, Sparse parametric modeling of the early part of acoustic impulse responses, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 678-682, ISSN: 2076-1465

Acoustic channels are typically described by their Acoustic Impulse Response (AIR) as a Moving Average (MA) process. Such AIRs are often considered in terms of their early and late parts, describing discrete reflections and the diffuse reverberation tail respectively. We propose an approach for constructing a sparse parametric model for the early part. The model aims at reducing the number of parameters needed to represent it and subsequently reconstruct from the representation the MA coefficients that describe it. It consists of a representation of the reflections arriving at the receiver as delayed copies of an excitation signal. The Time-Of-Arrivals of reflections are not restricted to integer sample instances and a dynamically estimated model for the excitation sound is used. We also present a corresponding parameter estimation method, which is based on regularized-regression and nonlinear optimization. The proposed method also serves as an analysis tool, since estimated parameters can be used for the estimation of room geometry, the mixing time and other channel properties. Experiments involving simulated and measured AIRs are presented, in which the AIR coefficient reconstruction-error energy does not exceed 11.4% of the energy of the original AIR coefficients. The results also indicate dimensionality reduction figures exceeding 90% when compared to a MA process representation.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00004259&limit=30&person=true