328 results found
Hafezi S, Moore AH, Naylor PA, 2017, Multiple DOA estimation based on estimation consistency and spherical harmonic multiple signal classification, European Signal Processing Conference, EUSIPCO 2017, Pages: 1240-1244
© EURASIP 2017. A common approach to multiple Direction-of- Arrival (DOA) estimation of speech sources is to identify Time- Frequency (TF) bins with dominant Single Source (SS) and apply DOA estimation such as Multiple Signal Classification (MUSIC) only on those TF bins. In the state-of-the-art Direct Path Dominance (DPD)-MUSIC, the covariance matrix, used as the input to MUSIC, is calculated using only the TF bins over a local TF region where only a SS is dominant. In this work, we propose an alternative approach to MUSIC in which all the SS-dominant TF bins for each speaker across TF domain are globally used to improve the quality of covariance matrix for MUSIC. Our recently proposed Multi-Source Estimation Consistency (MSEC) technique, which exploits the consistency of initial DOA estimates within a time frame based on adaptive clustering, is used to estimate the SS-dominant TF bins for each speaker. The simulation using spherical microphone array shows that our proposed MSEC-MUSIC significantly outperforms the state-of-the-art DPD-MUSIC with less than 6:5° mean estimation error and strong robustness to widely varying source separation for up to 5 sources in the presence of realistic reverberation and sensor noise.
Antonello N, De Sena E, Moonen M, et al., 2017, Room Impulse Response Interpolation Using a Sparse Spatio-Temporal Representation of the Sound Field, IEEE/ACM Transactions on Audio Speech and Language Processing, Vol: 25, Pages: 1929-1941, ISSN: 2329-9290
© 2017 IEEE. Room Impulse Responses (RIRs) are typically measured using a set of microphones and a loudspeaker. When RIRs spanning a large volume are needed, many microphone measurements must be used to spatially sample the sound field. In order to reduce the number of microphone measurements, RIRs can be spatially interpolated. In the present study, RIR interpolation is formulated as an inverse problem. This inverse problem relies on a particular acoustic model capable of representing the measurements. Two different acoustic models are compared: the plane wave decomposition model and a novel time-domain model, which consists of a collection of equivalent sources creating spherical waves. These acoustic models can both approximate any reverberant sound field created by a far-field sound source. In order to produce an accurate RIR interpolation, sparsity regularization is employed when solving the inverse problem. In particular, by combining different acoustic models with different sparsity promoting regularizations, spatial sparsity, spatio-spectral sparsity, and spatio-temporal sparsity are compared. The inverse problem is solved using a matrix-free large-scale optimization algorithm. Simulations show that the best RIR interpolation is obtained when combining the novel time-domain acoustic model with the spatio-temporal sparsity regularization, outperforming the results of the plane wave decomposition model even when far fewer microphone measurements are available.
Parada PP, Sharma D, van Waterschoot T, et al., 2017, Robust Statistical Processing of TDOA Estimates for Distant Speaker Diarization, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 86-90, ISSN: 2076-1465
Sharma D, Jost U, Naylor PA, 2017, Non-Intrusive Bit-Rate Detection of Coded Speech, 25th European Signal Processing Conference (EUSIPCO), Publisher: IEEE, Pages: 1799-1803, ISSN: 2076-1465
Hafezi S, Moore AH, Naylor PATRICK, 2017, Augmented Intensity Vectors for Direction of Arrival Estimation in the Spherical Harmonic Domain, IEEE Transactions on Audio, Speech and Language Processing, Vol: 25, Pages: 1956-1968, ISSN: 1558-7916
Pseudointensity vectors (PIVs) provide a means of direction of arrival (DOA) estimation for spherical microphone arrays using only the zeroth and the first-order spherical harmonics. An augmented intensity vector (AIV) is proposed which improves the accuracy of PIVs by exploiting higher order spherical harmonics. We compared DOA estimation using our proposed AIVs against PIVs, steered response power (SRP) and subspace methods where the number of sources, their angular separation, the reverberation time of the room and the sensor noise level are varied. The results show that the proposed approach outperforms the baseline methods and performs at least as accurately as the state-of-the-art method with strong robustness to reverberation, sensor noise, and number of sources. In the single and multiple source scenarios tested, which include realistic levels of reverberation and noise, the proposed method had average error of 1.5∘ and 2∘, respectively.
Eaton DJ, Gaubitch ND, Moore AH, et al., 2017, Acoustic Characterization of Environments (ACE) Challenge Results Technical Report, Publisher: arXiv
This document provides supplementary information, and the results of the tests of acoustic parameter estimation algorithms on the AcousticCharacterization of Environments (ACE) Challenge Evaluation dataset which were subsequently submitted and written up into papers for theProceedings of the ACE Challenge . This document is supporting material for a forthcoming journal paper on the ACE Challenge which will provide further analysis of the results.
Papayiannis C, Evers C, Naylor PA, 2017, Discriminative feature domains for reverberant acoustic environments, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, ISSN: 2379-190X
Several speech processing and audio data-mining applicationsrely on a description of the acoustic environment as a featurevector for classification. The discriminative properties of thefeature domain play a crucial role in the effectiveness of thesemethods. In this work, we consider three environment iden-tification tasks and the task of acoustic model selection forspeech recognition. A set of acoustic parameters and Ma-chine Learning algorithms for feature selection are used andan analysis is performed on the resulting feature domains foreach task. In our experiments, a classification accuracy of100% is achieved for the majority of tasks and the Word Er-ror Rate is reduced by 20.73 percentage points for AutomaticSpeech Recognition when using the resulting domains. Ex-perimental results indicate a significant dissimilarity in theparameter choices for the composition of the domains, whichhighlights the importance of the feature selection process forindividual applications.
Xue W, Brookes M, Naylor PA, 2017, Frequency-domain under-modelled blind system identification based on cross power spectrum and sparsity regularization, IEEE International Conference on Acoustics, Speech and Signal Processing, Pages: 591-595, ISSN: 1520-6149
© 2017 IEEE. In room acoustics, under-modelled multichannel blind system identification (BSI) aims to estimate the early part of the room impulse responses (RIRs), and it can be widely used in applications such as speaker localization, room geometry identification and beamforming based speech dereverberation. In this paper we extend our recent study on under-modelled BSI from the time domain to the frequency domain, such that the RIRs can be updated frame-wise and the efficiency of Fast Fourier Transform (FFT) is exploited to reduce the computational complexity. Analogous to the cross-correlation based criterion in the time domain, a frequency-domain cross power spectrum based criterion is proposed. As the early RIRs are usually sparse, the RIRs are estimated by jointly maximizing the cross power spectrum based criterion in the frequency domain and minimizing the l 1 -norm sparsity measure in the time domain. A two-stage LMS updating algorithm is derived to achieve joint optimization of these two targets. The experimental results in different under-modelled scenarios demonstrate the effectiveness of the proposed method.
Doire CSJ, Brookes DM, Naylor PA, 2017, Robust and efficient Bayesian adaptive psychometric function estimation, Journal of the Acoustical Society of America, Vol: 141, Pages: 2501-2512, ISSN: 0001-4966
The efficient measurement of the threshold and slope of the psychometric function (PF) is an important objective in psychoacoustics. This paper proposes a procedure that combines a Bayesian estimate of the PF with either a look one-ahead or a look two-ahead method of selecting the next stimulus presentation. The procedure differs from previously proposed algorithms in two respects: (i) it does not require the range of possible PF parameters to be specified in advance and (ii) the sequence of probe signal-to-noise ratios optimizes the threshold and slope estimates at a performance level, ϕ, that can be chosen by the experimenter. Simulation results show that the proposed procedure is robust and that the estimates of both threshold and slope have a consistently low bias. Over a wide range of listener PF parameters, the root-mean-square errors after 50 trials were ∼1.2 dB in threshold and 0.14 in log-slope. It was found that the performance differences between the look one-ahead and look two-ahead methods were negligible and that an entropy-based criterion for selecting the next stimulus was preferred to a variance-based criterion.
Pinero G, Naylor PA, 2017, CHANNEL ESTIMATION FOR CROSSTALK CANCELLATION IN WlRELESS ACOUSTIC NETWORKS, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 586-590, ISSN: 1520-6149
Javed HA, Cauchi B, Doclo S, et al., 2017, MEASURING, MODELLING AND PREDICTING PERCEIVED REVERBERATION, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Publisher: IEEE, Pages: 381-385, ISSN: 1520-6149
Parada PP, Sharma D, van Waterschoot T, et al., 2017, Confidence Measures for Nonintrusive Estimation of Speech Clarity Index, JOURNAL OF THE AUDIO ENGINEERING SOCIETY, Vol: 65, Pages: 90-99, ISSN: 1549-4950
Evers C, Rafaely B, Naylor PA, Speaker tracking in reverberant environments using multiple detections of arrival, HSCMA 2017, Publisher: IEEE
Accurate estimation of the Direction of Arrival (DOA) of a soundsource is an important prerequisite for a wide range of acoustic sig-nal processing applications. However, in enclosed environments,early reflections and late reverberation often lead to localization er-rors. Recent work demonstrated that improved robustness againstreverberation can be achieved by clustering only the DOAs fromdirect-path bins in the short-term Fourier transform of a speech sig-nal of several seconds duration from a static talker. Nevertheless, formoving talkers, short blocks of at most several hundred millisecondsare required to capture the spatio-temporal variation of the sourcedirection. Processing of short blocks of data in reverberant envi-ronment can lead to clusters whose centroids correspond to spuri-ous DOAs away from the source direction. We therefore propose inthis paper a novel multi-detection source tracking approach that es-timates the smoothed trajectory of the source DOAs. Results for re-alistic room simulations validate the proposed approach and demon-strate significant improvements in estimation accuracy compared tosingle-detection tracking.
Gebru ID, Evers C, Naylor PA, et al., Audio-visual tracking by density approximation in a sequential bayesian Filtering Framework, HSCMA 2017, Publisher: IEEE
The ability to explore and learn the surrounding environment is amajor precondition for autonomous systems and applications includ-ing Human-Robot Interaction. Robot audition is particularly usefulin situations where visual sensors suffer from limited Field of Viewor object occlusions. This is typically the case in scenarios wheremultiple talkers move freely within the environment surrounding therobot. Nevertheless, in enclosed environments, sound source local-ization is affected by reverberation of the sound waves off surround-ing objects. Audio-visual fusion is therefore beneficial in order todisambiguate the positions of multiple moving talkers. This paperproposes a novel audio-visual tracking approach that exploits con-structively both modalities in order to estimate the source trajectoriesin a joint state space. Recordings using a camcorder and microphonearray are used to evaluate the proposed approach, demonstratingsignficant improvements in tracking performance of the proposedaudio-visual approach compared to two benchmark visual trackers.
Löllmann HW, Moore AH, Naylor PA, et al., Microphone array signal processing for robot audition, Joint Workshop on Hands-free Speech Communication and Microphone Arrays, Publisher: IEEE
Robot audition for humanoid robots interacting naturally with hu-mans in an unconstrained real-world environment is a hitherto un-solved challenge. The recorded microphone signals are usually dis-torted by background and interfering noise sources (speakers) aswell as room reverberation. In addition, the movements of a robotand its actuators cause ego-noise which degrades the recorded sig-nals significantly. The movement of the robot body and its head alsocomplicates the detection and tracking of the desired, possibly mov-ing, sound sources of interest. This paper presents an overview ofthe concepts in microphone array processing for robot audition andsome recent achievements.
Hafezi S, Moore AH, Naylor PA, Multi-source estimation consistency for improved multiple direction-of-arrival estimation, Joint Workshop on Hands-free Speech Communication and Microphone Arrays, Publisher: IEEE
In Direction-of-Arrival(DOA) estimation for multiple sources,removal of noisy data points from a set of local DOA esti-mates increases the resulting estimation accuracy, especiallywhen there are many sources and they have small angularseparation. In this work, we propose a post-processing tech-nique for the enhancement of DOA extraction from a set oflocal estimates using the consistency of these estimates withinthe time frame based on adaptive multi-source assumption.Simulations in a realistic reverberant environment with sen-sor noise and up to 5 sources demonstrate that the proposedtechnique outperforms the baseline and state-of-the-art ap-proaches. In these tests the proposed technique had the worstaverage error of9◦, robustness of5◦to widely varying sourceseparation and3◦to number of sources.
Eaton DJ, javed HA, Naylor PA, Estimation of the Perceived Level of Reverberation using Non-intrusive Single-Channel Variance of Decay Rates, Joint Workshop on Hands-free Speech Communication and Microphone Arrays, Publisher: IEEE
The increasing processing power of hearing aids and mobile deviceshas led to the potential for incorporation of dereverberation algorithms to improve speech quality for the listener. Assessing the effectiveness of deverberation algorithms using subjective listening tests is extremely time consuming and depends on averaging out listener variations over a large number of subjects. Also, most existing instrumental measures are intrusive and require knowledge of the original signal which precludes many practical applications. In this paper we show that the proposed non-intrusive single-channel algorithm is a predictor of the perceived level of reverberation thatcorrelates well with subjective listening test results, outperforming many existing intrusive and non-intrusive measures. The algorithm requires only a single training step and has a very low computational complexity making it suitable for hearing aids and mobile telephone applications. The source code has been made freely available.
Doire CSJ, Brookes DM, Naylor PA, et al., 2016, Single-channel online enhancement of speech corrupted by reverberation and noise, IEEE Transactions on Audio Speech and Language Processing, Vol: 25, Pages: 572-587, ISSN: 1558-7924
This paper proposes an online single-channel speech enhancement method designed to improve the quality of speech degraded by reverberation and noise. Based on an auto-regressive model for the reverberation power and on a Hidden Markov Model for clean speech production, a Bayesian filtering formulation of the problem is derived and online joint estimation of the acoustic parameters and mean speech, reverberation and noise powers is obtained in Mel-frequency bands. From these estimates, a real-valued spectral gain is derived and spectral enhancement is applied in the STFT domain. The method yields state-of-the-art performance and greatly reduces the effects of reverberation and noise while improving speech quality and preserving speech intelligibility in challenging acoustic environments.
Evers C, Dorfan Y, Gannot S, et al., Source tracking using moving microphone arrays for robot audition, IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE
Intuitive spoken dialogues are a prerequisite for human-robot inter-action. In many practical situations, robots must be able to identifyand focus on sources of interest in the presence of interfering speak-ers. Techniques such as spatial filtering and blind source separa-tion are therefore often used, but rely on accurate knowledge of thesource location. In practice, sound emitted in enclosed environmentsis subject to reverberation and noise. Hence, sound source localiza-tion must be robust to both diffuse noise due to late reverberation, aswell as spurious detections due to early reflections. For improvedrobustness against reverberation, this paper proposes a novel ap-proach for sound source tracking that constructively exploits the spa-tial diversity of a microphone array installed in a moving robot. Inprevious work, we developed speaker localization approaches usingexpectation-maximization (EM) approaches and using Bayesian ap-proaches. In this paper we propose to combine the EM and Bayesianapproach in one framework for improved robustness against rever-beration and noise.
Hafezi S, Moore AH, Naylor P, Multiple source localization using estimation consistency in the time-frequency domain, IEEE International Conference on Acoustics Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers (IEEE), ISSN: 1520-6149
The extraction of multiple Direction-of-Arrival (DoA) in-formation from estimated spatial spectra can be challengingwhen such spectra are noisy or the sources are adjacent.Smoothing or clustering techniques are typically used toremove the effect of noise or irregular peaks in the spatialspectra. As we will explain and show in this paper, thesmoothing-based techniques require prior knowledge of min-imum angular separation of the sources and the clustering-based techniques fail on noisy spatial spectrum. A broadclass of localization techniques give direction estimatesineach Time Frequency (TF) bin. Using this information asinput, a novel technique for obtaining robust localizationofmultiple simultaneous sources is proposed using EstimationConsistency (EC) in the TF domain. The method is evaluatedin the context of spherical microphone arrays. This techniquedoes not require prior knowledge of the sources and by re-moving the noise in the estimated spatial spectrum makesclustering a reliable and robust technique for multiple DoAextraction from estimated spatial spectra. The results indicatethat the proposed technique has the strongest robustness toseparation with up to10◦median error for5◦to180◦sepa-ration for2and3sources, compared to the baseline and thestate-of-the-art techniques.
Moore AH, Brookes D, Naylor PA, Robust spherical harmonic domain interpolation of spatially sampled array manifolds, IEEE International Conference on Acoustics Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers (IEEE), ISSN: 1520-6149
Accurate interpolation of the array manifold is an important firststep for the acoustic simulation of rapidly moving microphone ar-rays. Spherical harmonic domain interpolation has been proposedand well studied in the context of head-related transfer functions buthas focussed on perceptual, rather than numerical, accuracy. In thispaper we analyze the effect of measurement noise on spatial aliasing.Based on this analysis we propose a method for selecting the trunca-tion orders for the forward and reverse spherical Fourier transformsgiven only the noisy samples in such a way that the interpolationerror is minimized. The proposed method achieves up to 1.7 dB im-provement over the baseline approach.
Yiallourides C, Manning V, Moore AH, et al., A dynamic programming approach for automatic stride detection and segmentation in acoustic emission from the knee, IEEE International Conference on Acoustics Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers (IEEE), ISSN: 1520-6149
We study the acquisition and analysis of sounds generated bythe knee during walking with particular focus on the effectsdue to osteoarthritis. Reliable contact instant estimation isessential for stride synchronous analysis. We present a dy-namic programming based algorithm for automatic estima-tion of both the initial contact instants (ICIs) and last contactinstants (LCIs) of the foot to the floor. The technique is de-signed for acoustic signals sensed at the patella of the knee. Ituses the phase-slope function to generate a set of candidatesand then finds the most likely ones by minimizing a cost func-tion that we define. ICIs are identified with an RMS errorof 13.0% for healthy and 14.6% for osteoarthritic knees andLCIs with an RMS error of 16.0% and 17.0% respectively.
Lightburn L, De Sena E, Moore AH, et al., Improving the perceptual quality of ideal binary masked speech, IEEE International Conference on Acoustics Speech and Signal Processing, Publisher: Institute of Electrical and Electronics Engineers (IEEE), ISSN: 1520-6149
It is known that applying a time-frequency binary mask tovery noisy speech can improve its intelligibility but resultsin poor perceptual quality. In this paper we propose a newapproach to applying a binary mask that combines the intel-ligibility gains of conventional binary masking with the per-ceptual quality gains of a classical speech enhancer. The bi-nary mask is not applied directly as a time-frequency gain asin most previous studies. Instead, the mask is used to sup-ply prior information to a classical speech enhancer about theprobability of speech presence in different time-frequency re-gions. Using an oracle ideal binary mask, we show that theproposed method results in a higher predicted quality thanother methods of applying a binary mask whilst preservingthe improvements in predicted intelligibility.
Moore AH, Peso P, Naylor PA, 2016, Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures, Computer Speech and Language, Vol: 46, Pages: 574-584, ISSN: 1095-8363
Automatic speech recognition in everyday environments must be robust to significant levels of reverberation andnoise. One strategy to achieve such robustness is multi-microphone speech enhancement. In this study, we presentresults of an evaluation of different speech enhancement pipelines using a state-of-the-artASRsystem for a widerange of reverberation and noise conditions. The evaluation exploits the recently released ACE Challenge databasewhich includes measured multichannel acoustic impulse responses from 7 different rooms with reverberation timesranging from 0.33 s to 1.34 s. The reverberant speech is mixed with ambient, fan and babble noise recordings madewith the same microphone setups in each of the rooms. In the first experiment performance of theASRwithoutspeech processing is evaluated. Results clearly indicate the deleterious effect of both noise and reverberation. In thesecond experiment, different speech enhancement pipelines are evaluated with relative word error rate reductions ofup to 82%. Finally, the ability of selected instrumental metrics to predictASRperformance improvement is assessed.The best performing metric, Short-Time Objective Intelligibility Measure, is shown to have a Pearson correlationcoefficient of 0.79, suggesting that it is a useful predictor of algorithm performance in these tests.
Evers C, Moore A, Naylor P, 2016, Localization of Moving Microphone Arrays from Moving Sound Sources for Robot Audition, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465
Acoustic Simultaneous Localization and Mapping(a-SLAM) jointly localizes the trajectory of a microphone arrayinstalled on a moving platform, whilst estimating the acousticmap of surrounding sound sources, such as human speakers.Whilst traditional approaches for SLAM in the vision and opticalresearch literature rely on the assumption that the surroundingmap features are static, in the acoustic case the positions oftalkers are usually time-varying due to head rotations and bodymovements. This paper demonstrates that tracking of movingsources can be incorporated in a-SLAM by modelling the acousticmap as a Random Finite Set (RFS) of multiple sources andexplicitly imposing models of the source dynamics. The proposedapproach is verified and its performance evaluated for realisticsimulated data.
Hafezi S, Moore AH, Naylor PA, 2016, Multiple source localization in the spherical harmonic domain using augmented intensity vectors based on grid search, European Signal Processing Conference, Publisher: IEEE, ISSN: 2219-5491
Multiple source localization is an important task in acousticsignal processing with applications including dereverberation,source separation, source tracking and environmentmapping. When using spherical microphone arrays, it hasbeen previously shown that Pseudo-intensity Vectors (PIV),and Augmented Intensity Vectors (AIV), are an effective approachfor direction of arrival estimation of a sound source.In this paper, we evaluate AIV-based localization in acousticscenarios involving multiple sound sources. Simulations areconducted where the number of sources, their angular separationand the reverberation time of the room are varied. Theresults indicate that AIV outperforms PIV and Steered ResponsePower (SRP) with an average accuracy between 5 and10 degrees for sources with angular separation of 30 degreesor more. AIV also shows better robustness to reverberationtime than PIV and SRP.
Dorfan Y, Evers C, Gannot S, et al., 2016, Speaker Localization with Moving Microphone Arrays, European Signal Processing Conference (EUSIPCO), Publisher: IEEE, ISSN: 2076-1465
Speaker localization algorithms often assume staticlocation for all sensors. This assumption simplifies the modelsused, since all acoustic transfer functions are linear time invariant.In many applications this assumption is not valid. Inthis paper we address the localization challenge with movingmicrophone arrays. We propose two algorithms to find thespeaker position. The first approach is a batch algorithm basedon the maximum likelihood criterion, optimized via expectationmaximizationiterations. The second approach is a particle filterfor sequential Bayesian estimation. The performance of bothapproaches is evaluated and compared for simulated reverberantaudio data from a microphone array with two sensors.
Moore AH, Evers C, Naylor PA, 2016, 2D direction of arrival estimation of multiple moving sources using a spherical microphone array, European Signal Processing Conference, Publisher: IEEE, ISSN: 2219-5491
Direction of arrival estimation using a spherical microphonearray is an important and growing research area. One promisingalgorithm is the recently proposed Subspace PseudoIntensityVector method. In this contribution the SubspacePseudo-Intensity Vector method is combined with a state-ofthe-artmethod for robustly estimating the centres of mass in a2D histogram based on matching pursuits. The performanceof the improved Subspace Pseudo-Intensity Vector method isevaluated in the context of localising multiple moving sourceswhere it is shown to outperform competing methods in termsof clutter rate and the number of missed detections whilstremaining comparable in terms of localisation accuracy.
Xue W, Brookes DM, Naylor PA, 2016, Under-modelled blind system identification for time delay estimation in reverberant environments, 15th International Workshop on Acoustic Signal Enhancement (IWAENC), Publisher: IEEE
In multichannel systems, acoustic time delay estimation (TDE) is a challenging problem in reverberant environments. Although blind system identification (BSI) based methods have been proposed which utilize a realistic signal model for the room impulse response (RIR), their TDE performance depends strongly on that of the BSI, which is often inaccurate in practice when the identified responses are under-modelled. In this paper, we propose a new under-modelled BSI based method for TDE in reverberant environments. An under-modelled BSI algorithm is derived, which is based on maximizing the cross-correlation of the cross-filtered signals rather than minimizing the cross-relation error, and also exploits the sparsity of the early part of the RIR. For TDE, this new criterion can be viewed as a generalization of conventional cross-correlation-based TDE methods by considering a more realistic model for the early RIR. Depending on the microphone spacing, only a short early part of each RIR is identified, and the time delays are estimated based on the peak locations in the identified early RIRs. Experiments in different reverberant environments with speech source signals demonstrate the effectiveness of the proposed method.
Moore AH, Evers C, Naylor PA, 2016, Direction of Arrival Estimation in the Spherical Harmonic Domain using Subspace Pseudo-Intensity Vectors, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol: 25, Pages: 178-192, ISSN: 2329-9290
Direction of Arrival (DOA) estimation is a fundamental problem in acoustic signal processing. It is used in a diverse range of applications, including spatial filtering, speech dereverberation, source separation and diarization. Intensity vector-based DOA estimation is attractive, especially for spherical sensor arrays, because it is computationally efficient. Two such methods are presented which operate on a spherical harmonic decomposition of a sound field observed using a spherical microphone array. The first uses Pseudo-Intensity Vectors (PIVs) and works well in acoustic environments where only one sound source is active at any time. The second uses Subspace Pseudo-Intensity Vectors (SSPIVs) and is targeted at environments where multiple simultaneous sources and significant levels of reverberation make the problem more challenging. Analytical models are used to quantify the effects of an interfering source, diffuse noise and sensor noise on PIVs and SSPIVs. The accuracy of DOA estimation using PIVs and SSPIVs is compared against the state-of-the-art in simulations including realistic reverberation and noise for single and multiple, stationary and moving sources. Finally, robust performance of the proposed methods is demonstrated using speech recordings in real acoustic environments.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.