Publications
106 results found
Hogg A, Liu H, Mads J, et al., 2023, Exploring the Impact of Transfer Learning on GAN-Based HRTF Upsampling, EAA Forum Acusticum, European Congress on Acoustics
Picinali L, Grimm G, Hioka Y, et al., 2023, VR/AR and hearing research: current examples and future challenges, Forum Acusticum 2023
A well-known issue in clinical audiology and hearing research is the level of abstraction of traditional experimental assessments and methods, which lack ecological validity and differ significantly from real-life experiences,often resulting in unreliable outcomes. Attempts to dealwith this matter by, for example, performing experimentsin real-life contexts, can be problematic due to the difficulty of accurately identifying control-specific parametersand events. Virtual and augmented reality (VR/AR) havethe potential to provide dynamic and immersive audiovisual experiences that are at the same time realistic andhighly controllable. Several successful attempts have beenmade to create and validate VR-based implementationsof standard audiological and linguistic tests, as well asto design procedures and technologies to assess meaningful and ecologically-valid data. Similarly, new viewpoints on auditory perception have been provided by looking at hearing training and auditory sensory augmentation, aiming at improving perceptual skills in tasks suchas speech understanding and sound-source localisation. In this contribution, we bring together researchers active inthis domain. We briefly describe experiments they havedesigned, and jointly identify challenges that are still openand common approaches to tackle them
Giraud P, Sum K, Pontoppidan NH, et al., 2023, Adaptation to altered interaural time differences in a virtual reality environment, Forum Acusticum 2023
Interaural time differences (ITDs) are important cues fordetermining the azimuth location of a sound source andneed to be accurately reproduced, in a virtual reality (VR)environment, to achieve a realistic sense of sound location for the listener. ITDs are usually included in headrelated transfer functions (HRTFs) used for audio rendering, and can be individualised to match the user’s headsize (e.g. longer ITDs are needed for larger head sizes).In recent years, studies have shown that it is possibleto train subjects to adapt and improve their performancein sound localisation skills to non-individualized HRTFs.The analysis of such improvements has focused mainlyon adaptation to monoaural spectral cues rather than binaural cues such as ITDs. In this work listeners are placedin a VR environment and are asked to localise the sourceof a noise burst in the horizontal plane. Using a genericnon-individualized HRTF with its ITD modified to matchthe head size of each participant, test and training phasesare alternated, with the latter providing continuous auditory feedback. The experiment is then repeated with ITDssimulating larger (150%) and smaller (50%) head sizes.Comparing localisation accuracy before and after training, it is observed that while training seems to improvesound localisation performance, this varies according tothe simulated head size and target location.
Valzolgher C, Capra S, Pavani F, et al., 2023, Training spatial hearing skills in virtual reality through a sound-reaching task, Forum Acusticum 2023
Sound localization is crucial for interacting with thesurrounding world. This ability can be learned across timeand improved by multisensory and motor cues. In the lastdecade, studying the contributions of multisensory andmotor cues has been facilitated by the increased adoption ofvirtual reality (VR). In a recent study, sound localizationhad been trained through a task where the visual stimuliwere rendered through a VR headset, and the auditory onesthrough a loudspeaker moved around by the experimenter.Physically reaching to sound sources reduced soundlocalization errors faster and to a greater extent if comparedto naming sources’ positions. Interestingly, training efficacyextended also to hearing-impaired people. Yet, thisapproach is unfeasible for rehabilitation at home. Fullyvirtual approaches have been used to study spatial hearinglearning processes, performing headphones-renderedacoustic simulations. In the present study, we investigatewhether the effects of our reaching-based training can beobserved when taking advantage of such simulations,showing that the improvement is comparable between thefull-VR and blended VR conditions. This validates the useof training paradigms that are completely based on portableequipment and don’t require an external operator, openingnew perspectives in the field of remote rehabilitation.
Chard I, Van Zalk N, Picinali L, 2023, Virtual reality exposure therapy for reducing social anxiety associated with stuttering: the role of outcome expectancy, therapeutic alliance, presence and social presence, Frontiers in Virtual Reality, Vol: 4, Pages: 1-15, ISSN: 2673-4192
Introduction: Although several trials have demonstrated the effectiveness of Virtual Reality Exposure Therapy (VRET) for reducing social anxiety, there is little understanding about the factors that lead to symptom reduction across different treatment designs. Such factors may include outcome expectancy, therapeutic alliance, presence (perception of being in the virtual environment) and social presence (perception of interacting with others). We report on findings from a pilot trial of VRET targeting social anxiety in people who stutter, and examine the association of these four factors with treatment outcome.Methods: People who stutter reporting heightened social anxiety (n = 22) took part in the trial after being recruited via online adverts. Remotely delivered VRET was administered to participants in three sessions across three weeks. Each session targeted both performative and interactive anxiety. A virtual therapist helped participants to engage with treatment strategies, whilst also guiding them through exercises.Results: Findings showed that presence and social presence were both negatively related to changes in fear of negative evaluation between pre- and post-treatment. However, presence, outcome expectancy and therapeutic alliance were positively related to changes in social anxiety symptoms. Furthermore, outcome expectancy and therapeutic alliance were quadratically related to fear of negative evaluation change. Nevertheless, the effect of presence on social anxiety, and the effects of presence and therapeutic alliance on fear of negative evaluation must be interpreted with caution as these were not large enough to reach sufficient statistical power. Therapeutic alliance did not mediate the relationship between outcome expectancy and treatment outcome.Discussion: These findings suggest that the current VRET protocol affected social anxiety and fear of negative evaluation differently. We discuss how presence may underlie these mixed associations. We also s
Martin V, Picinali L, 2023, Comparing online vs. lab-based experimental approaches for the perceptual evaluation of artificial reverberation, Forum Acusticum
Daugintis R, Barumerli R, Geronazzo M, et al., 2023, Initial evaluation of an auditory-model-aided selection procedure for non-individual HRTFs, Forum Acusticum, Pages: 1-8, ISSN: 2221-3767
Binaural spatial audio reproduction systems use measuredor simulated head-related transfer functions (HRTFs),which encode the effects of the outer ear and body onthe incoming sound to recreate a realistic spatial auditoryfield around the listener. The sound localisation cues embedded in the HRTF are highly personal. Establishingperceptual similarity between different HRTFs in a reliable manner is challenging due to a combination of acoustic and non-acoustic aspects affecting our spatial auditoryperception. To account for these factors, we propose anautomated procedure to select the ‘best’ non-individualHRTF dataset from a pool of measured ones. For a groupof human participants with their own acoustically measured HRTFs, a multi-feature Bayesian auditory sound localisation model is used to predict individual localisationperformance with the other HRTFs from within the group.Then, the model selection of the ‘best’ and the ‘worst’non-individual HRTFs is evaluated via an actual localisation test and a subjective audio quality assessment in comparison with individual HRTFs. A successful model-aidedobjective selection of the ‘best’ non-individual HRTF mayprovide relevant insights for effective and handy binaural spatial audio solutions in virtual/augmented reality(VR/AR) applications.
Pauwels J, Picinali L, 2023, On the relevance of the differences between HRTF measurement setups for machine learning, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 1-5
As spatial audio is enjoying a surge in popularity, data-driven machine learning techniques that have been proven successful in other domains are increasingly used to process head-related transfer function measurements. However, these techniques require much data, whereas the existing datasets are ranging from tens to the low hundreds of datapoints. It therefore becomes attractive to combine multiple of these datasets, although they are measured under different conditions. In this paper, we first establish the common ground between a number of datasets, then we investigate potential pitfalls of mixing datasets. We perform a simple experiment to test the relevance of the remaining differences between datasets when applying machine learning techniques. Finally, we pinpoint the most relevant differences.
Daugintis R, Barumerli R, Picinali L, et al., 2023, Classifying non-individual head-related transfer functions with a computational auditory model: calibration and metrics, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 1-5
This study explores the use of a multi-feature Bayesian auditory sound localisation model to classify non-individual head-related transfer functions (HRTFs). Based on predicted sound localisation performance, these are grouped into ‘good’ and ‘bad’, and the ‘best’/‘worst’ is selected from each category. Firstly, we present a greedy algorithm for automated individual calibration of the model based on the individual sound localisation data. We then discuss data analysis of predicted directional localisation errors and present an algorithm for categorising the HRTFs based on the localisation error distributions within a limited range of directions in front of the listener. Finally, we discuss the validity of the classification algorithm when using averaged instead of individual model parameters. This analysis of auditory modelling results aims to provide a perceptual foundation for automated HRTF personalisation techniques for an improved experience of binaural spatial audio technologies.
Pedersen RL, Picinali L, Kajs N, et al., 2023, Virtual-Reality-Based Research in Hearing Science: A Platforming Approach, AES: Journal of the Audio Engineering Society, Vol: 71, Pages: 374-389, ISSN: 1549-4950
The lack of ecological validity in clinical assessment, as well as the challenge of investigating multimodal sensory processing, remain key challenges in hearing science. Virtual Reality (VR) can support hearing research in these domains by combining experimental control with situational realism. However, the development of VR-based experiments is traditionally highly resource demanding, which places a significant entry barrier for basic and clinical researchers looking to embrace VR as the research tool of choice. The Oticon Medical Virtual Reality (OMVR) experiment platform fast-tracks the creation or adaptation of hearing research experiment templates to be used to explore areas such as binaural spatial hearing, multimodal sensory integration, cognitive hearing behavioral strategies, auditory-visual training, etc. In this paper, the OMVR’s functionalities, architecture, and key elements of implementation are presented, important performance indicators are characterized, and a use-case perceptual evaluation is presented.
Engel I, Daugintis R, Vicente T, et al., 2023, The SONICOM HRTF dataset, Journal of the Audio Engineering Society, Vol: 71, Pages: 241-253, ISSN: 0004-7554
Immersive audio technologies, ranging from rendering spatialized sounds accurately to efficient room simulations, are vital to the success of augmented and virtual realities. To produce realistic sounds through headphones, the human body and head must both be taken into account. However, the measurement of the influence of the external human morphology on the sounds incoming to the ears, which is often referred to as head-related transfer function (HRTF), is expensive and time-consuming. Several datasets have been created over the years to help researcherswork on immersive audio; nevertheless, the number of individuals involved and amount of data collected is often insufficient for modern machine-learning approaches. Here, the SONICOM HRTF dataset is introduced to facilitate reproducible research in immersive audio. This dataset contains the HRTF of 120 subjects, as well as headphone transfer functions; 3D scans of ears, heads, and torsos; and depth pictures at different angles around subjects' heads.
Chard I, van Zalk N, Picinali L, 2023, Virtual reality exposure therapy for reducing social anxiety in stuttering: a randomized controlled pilot trial, Frontiers in Digital Health, Vol: 5, Pages: 1-14, ISSN: 2673-253X
We report on findings from the first randomized controlled pilot trial of virtual reality exposure therapy (VRET) developed specifically for reducing social anxiety associated with stuttering. People who stutter with heightened social anxiety were recruited from online adverts and randomly allocated to receive VRET (n = 13) or be put on a waitlist (n = 12). Treatment was delivered remotely using a smartphone-based VR headset. It consisted of three weekly sessions, each comprising both performative and interactive exposure exercises, and was guided by a virtual therapist. Multilevel model analyses failed to demonstrate the effectiveness of VRET at reducing social anxiety between pre- and post-treatment. We found similar results for fear of negative evaluation, negative thoughts associated with stuttering, and stuttering characteristics. However, VRET was associated with reduced social anxiety between post-treatment and one-month follow-up. These pilot findings suggest that our current VRET protocol may not be effective at reducing social anxiety amongst people who stutter, though might be capable of supporting longer-term change. Future VRET protocols targeting stuttering-related social anxiety should be explored with larger samples. The results from this pilot trial provide a solid basis for further design improvements and for future research to explore appropriate techniques for widening access to social anxiety treatments in stuttering.
Picinali L, Katz BFG, Geronazzo M, et al., 2022, The SONICOM Project: artificial intelligence-driven immersive audio, from personalization to modeling, IEEE: Signal Processing Magazine, Vol: 39, Pages: 85-88, ISSN: 1053-5888
Every individual perceives spatial audio differently, due in large part to the unique and complex shape of ears and head. Therefore, high-quality, headphone-based spatial audio should be uniquely tailored to each listener in an effective and efficient manner. Artificial intelligence (AI) is a powerful tool that can be used to drive forward research in spatial audio personalization. The SONICOM project aims to employ a data-driven approach that links physiological characteristics of the ear to the individual acoustic filters, which allows us to localize sound sources and perceive them as being located around us. A small amount of data acquired from users could allow personalized audio experiences, and AI could facilitate this by offering a new perspective on the matter. A Bayesian approach to computational neuroscience and binaural sound reproduction will be linked to create a metric for AI-based algorithms that will predict realistic spatial audio quality. Being able to consistently and repeatedly evaluate and quantify the improvements brought by technological advancements, as well as the impact these have on complex interactions in virtual environments, will be key for the development of new techniques and for unlocking new approaches to understanding the mechanisms of human spatial hearing and communication.
Picinali L, Katz BFG, 2022, System-to-user and user-to-system adaptations in Binaural audio, Sonic Interactions in Virtual Environments, Editors: Geronazzo, Serafin, Publisher: Springer, Pages: 115-143
This chapter concerns concepts of adaption in a binaural audio context (i.e. headphone-based three-dimensional audio rendering and associated spatial hearing aspects), considering first the adaptation of the rendering system to the acoustic and perceptual properties of the user, and second the adaptation of the user to the rendering quality of the system. We start with an overview of the basic mechanisms of human sound source localisation, introducing expressions such as localisation cues and interaural differences, and the concept of the Head-Related Transfer Function (HRTF), which is the basis of most 3D spatialisation systems in VR. The chapter then moves to more complex concepts and processes, such as HRTF selection (system-to-user adaptation) and HRTF accommodation (user-to-system adaptation). State-of-the-art HRTF modelling and selection methods are presented, looking at various approaches and at how these have been evaluated. Similarly, the process of HRTF accommodation is detailed, with a case study employed as an example. Finally, the potential of these two approaches are discussed, considering their combined use in a practical context, as well as introducing a few open challenges for future research.
Geronazzo M, Serafin S, 2022, Sonic interactions in virtual environments, Publisher: Springer, ISBN: 9783031040207
This book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are:Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologiesSonic interaction: the human-computer interplay through auditory feedback in VEVR systems: naturally support multimodal integration, impacting different application domainsSonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond.Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments.
Siripornpitak P, Engel I, Cooper S, et al., 2022, Spatial up-sampling of HRTF sets using generative adversarial networks: a pilot study, Frontiers in Signal Processing, Vol: 2, Pages: 1-10, ISSN: 2673-8198
Headphone-based spatial audio simulations rely on Head Related Transfer Functions (HRTFs) in order to reconstruct the sound field at the entrance of the listener’s ears. A HRTF is strongly dependent on the listener’s specific anatomical structures, and it has been shown that virtual sounds recreated with someone else’s HRTF result in worse localisation accuracy, as well as altering other subjective measures such as externalisation and realism. Acoustic measurements of the filtering effects generated by ears, head and torso has proven to be one of the most reliable ways to obtain a personalised HRTF. However this requires a dedicated and expensive setup, and is time-intensive. In order to simplify the measurement setup, thereby improving the scalability of the process, we are exploring strategies to reduce the number of acoustic measurements without degrading the spatial resolution of the HRTF. Traditionally, spatial up sampling of HRTF sets is achieved through barycentric interpolation or by employing the spherical harmonics framework. However, such methods often perform poorly when the provided HRTF data is spatially very sparse. This work investigates the use of generative adversarial networks (GANs) to tackle the up-sampling problem, offering an initial insight about the suitability of this technique. Numerical evaluations based on spectral magnitude error and perceptual model outputs are presented on single spatial dimensions, therefore considering sources positioned only in one of the three main planes: horizontal, median, and frontal. Results suggest that traditional HRTF interpolation methods perform better than the proposed GAN-based one when the distance between measurements is smaller than 90°, but for the sparsest conditions (i.e. one measurement every 120° to 180°), the proposed approach outperforms the others.
Reyes-Lecuona A, Bouchara T, Picinali L, 2022, Immersive sound for XR, Roadmapping Extended Reality: Fundamentals and Applications, Pages: 75-102, ISBN: 9781119865148
Sound plays a very important role in everyday life as well as in XR applications, as it will be explained in this chapter. Recent advances and challenges in immersive audio research are presented, discussing how, why, and to which extent there is potential for further development of these technologies applied to XR. The fundamentals of immersive audio rendering for XR are introduced before presenting the main technological challenges still open in the area. Finally, a series of future applications is presented, which the authors envision being examples of the potential of immersive audio in XR, and a research roadmap is outlined.
Comunita M, Gerino A, Picinali L, 2022, PlugSonic: a web- and mobile-based platform for dynamic and navigable binaural audio, Eurasip Journal on Audio, Speech, and Music Processing, Vol: 18, ISSN: 1687-4714
PlugSonic is a series of web- and mobilebased applications designed to: edit samples and applyaudio effects (PlugSonic Sample), create and experience dynamic and navigable soundscapes and sonic narratives (PlugSonic Soundscape). The audio processingwithin PlugSonic is based on the Web Audio API whilethe binaural rendering uses the 3D Tune-In Toolkit.Exploration of soundscapes in a physical space is madepossible by adopting Apple’s ARKit. The present paperdescribes the implementation details, the signal processing chain and the necessary steps to curate and experience a soundscape. We also include some metricsand performance details. The main goal of PlugSonic isto give users a complete set of tools, without the needfor specific devices, external software and/or hardware,specialised knowledge or custom development; with theidea that spatial audio has the potential to become areadily accessible and easy to understand technology,for anyone to adopt, whether for creative or researchpurposes.
Saksida A, Ghiselli S, Picinali L, et al., 2022, Attention to speech and music in young children with bilateral cochlear implants: a pupillometry study, The Journal of Clinical Sleep Medicine, Vol: 11, Pages: 1-14, ISSN: 1550-9389
Early bilateral cochlear implants (CIs) may enhance attention to speech, and reduce cognitive load in noisy environments. However, it is sometimes difficult to measure speech perception and listening effort, especially in very young children. Behavioral measures cannot always be obtained in young/uncooperative children, whereas objective measures are either difficult to assess or do not reliably correlate with behavioral measures. Recent studies have thus explored pupillometry as a possible objective measure. Here, pupillometry is introduced to assess attention to speech and music in noise in very young children with bilateral CIs (N = 14, age: 17–47 months), and in the age-matched group of normally-hearing (NH) children (N = 14, age: 22–48 months). The results show that the response to speech was affected by the presence of background noise only in children with CIs, but not NH children. Conversely, the presence of background noise altered pupil response to music only in in NH children. We conclude that whereas speech and music may receive comparable attention in comparable listening conditions, in young children with CIs, controlling for background noise affects attention to speech and speech processing more than in NH children. Potential implementations of the results for rehabilitation procedures are discussed.
Engel Alonso Martinez J, Goodman D, Picinali L, 2022, Assessing HRTF preprocessing methods for Ambisonics rendering through perceptual models, Acta Acustica -Peking-, Vol: 6, ISSN: 0371-0025
Binaural rendering of Ambisonics signals is a common way to reproduce spatial audio content. Processing Ambisonics signals at low spatial orders is desirable in order to reduce complexity, although it may degrade the perceived quality, in part due to the mismatch that occurs when a low-order Ambisonics signal is paired with a spatially dense head-related transfer function (HRTF). In order to alleviate this issue, the HRTF may be preprocessed so its spatial order is reduced. Several preprocessing methods have been proposed, but they have not been thoroughly compared yet. In this study, nine HRTF preprocessing methods were used to render anechoic binaural signals from Ambisonics representations of orders 1 to 44, and these were compared through perceptual hearing models in terms of localisation performance, externalisation and speech reception. This assessment was supported by numerical analyses of HRTF interpolation errors, interaural differences, perceptually-relevant spectral differences, and loudness stability. Models predicted that the binaural renderings’ accuracy increased with spatial order, as expected. A notable effect of the preprocessing method was observed: whereas all methods performed similarly at the highest spatial orders, some were considerably better at lower orders. A newly proposed method, BiMagLS, displayed the best performance overall and is recommended for the rendering of bilateral Ambisonics signals. The results, which were in line with previous literature, indirectly validate the perceptual models’ ability to predict listeners’ responses in a consistent and explicable manner.
Salorio-Corbetto M, Williges B, Lamping W, et al., 2022, Evaluating spatial hearing using a dual-task approach in a virtual-acoustics environment, Frontiers in Neuroscience, Vol: 16, Pages: 1-17, ISSN: 1662-453X
Spatial hearing is critical for communication in everyday sound-rich environments. It is important to gain an understanding of how well users of bilateral hearing devices function in these conditions. The purpose of this work was to evaluate a Virtual Acoustics (VA) version of the Spatial Speech in Noise (SSiN) test, the SSiN-VA. This implementation uses relatively inexpensive equipment and can be performed outside the clinic, allowing for regular monitoring of spatial-hearing performance. The SSiN-VA simultaneously assesses speech discrimination and relative localization with changing source locations in the presence of noise. The use of simultaneous tasks increases the cognitive load to better represent the difficulties faced by listeners in noisy real-world environments. Current clinical assessments may require costly equipment which has a large footprint. Consequently, spatial-hearing assessments may not be conducted at all. Additionally, as patients take greater control of their healthcare outcomes and a greater number of clinical appointments are conducted remotely, outcome measures that allow patients to carry out assessments at home are becoming more relevant. The SSiN-VA was implemented using the 3D Tune-In Toolkit, simulating seven loudspeaker locations spaced at 30° intervals with azimuths between −90° and +90°, and rendered for headphone playback using the binaural spatialization technique. Twelve normal-hearing participants were assessed to evaluate if SSiN-VA produced patterns of responses for relative localization and speech discrimination as a function of azimuth similar to those previously obtained using loudspeaker arrays. Additionally, the effect of the signal-to-noise ratio (SNR), the direction of the shift from target to reference, and the target phonetic contrast on performance were investigated. SSiN-VA led to similar patterns of performance as a function of spatial location compared to loudspeaker setups for both relative lo
Bürgel M, Picinali L, Siedenburg K, 2021, Listening in the mix: lead vocals robustly attract auditory attention in popular music, Frontiers in Psychology, Vol: 12, Pages: 1-15, ISSN: 1664-1078
Listeners can attend to and track instruments or singing voices in complex musical mixtures, even though the acoustical energy of sounds from individual instruments may overlap in time and frequency. In popular music, lead vocals are often accompanied by sound mixtures from a variety of instruments, such as drums, bass, keyboards, and guitars. However, little is known about how the perceptual organization of such musical scenes is affected by selective attention, and which acoustic features play the most important role. To investigate these questions, we explored the role of auditory attention in a realistic musical scenario. We conducted three online experiments in which participants detected single cued instruments or voices in multi-track musical mixtures. Stimuli consisted of 2-s multi-track excerpts of popular music. In one condition, the target cue preceded the mixture, allowing listeners to selectively attend to the target. In another condition, the target was presented after the mixture, requiring a more “global” mode of listening. Performance differences between these two conditions were interpreted as effects of selective attention. In Experiment 1, results showed that detection performance was generally dependent on the target’s instrument category, but listeners were more accurate when the target was presented prior to the mixture rather than the opposite. Lead vocals appeared to be nearly unaffected by this change in presentation order and achieved the highest accuracy compared with the other instruments, which suggested a particular salience of vocal signals in musical mixtures. In Experiment 2, filtering was used to avoid potential spectral masking of target sounds. Although detection accuracy increased for all instruments, a similar pattern of results was observed regarding the instrument-specific differences between presentation orders. In Experiment 3, adjusting the sound level differences between the targets reduced the effect of
Sethi SS, Ewers RM, Jones NS, et al., 2021, Soundscapes predict species occurrence in tropical forests, OIKOS, Vol: 2022, Pages: 1-9, ISSN: 0030-1299
Accurate occurrence data is necessary for the conservation of keystone or endangered species, but acquiring it is usually slow, laborious and costly. Automated acoustic monitoring offers a scalable alternative to manual surveys but identifying species vocalisations requires large manually annotated training datasets, and is not always possible (e.g. for lesser studied or silent species). A new approach is needed that rapidly predicts species occurrence using smaller and more coarsely labelled audio datasets. We investigated whether local soundscapes could be used to infer the presence of 32 avifaunal and seven herpetofaunal species in 20 min recordings across a tropical forest degradation gradient in Sabah, Malaysia. Using acoustic features derived from a convolutional neural network (CNN), we characterised species indicative soundscapes by training our models on a temporally coarse labelled point-count dataset. Soundscapes successfully predicted the occurrence of 34 out of the 39 species across the two taxonomic groups, with area under the curve (AUC) metrics from 0.53 up to 0.87. The highest accuracies were achieved for species with strong temporal occurrence patterns. Soundscapes were a better predictor of species occurrence than above-ground carbon density – a metric often used to quantify habitat quality across forest degradation gradients. Our results demonstrate that soundscapes can be used to efficiently predict the occurrence of a wide variety of species and provide a new direction for data driven large-scale assessments of habitat suitability.
Setti W, Engel IA-M, Cuturi LF, et al., 2021, The Audio-Corsi: an acoustic virtual reality-based technological solution for evaluating audio-spatial memory abilities, Journal on Multimodal User Interfaces, Vol: 16, ISSN: 1783-7677
Spatial memory is a cognitive skill that allows the recall of information about the space, its layout, and items’ locations. We present a novel application built around 3D spatial audio technology to evaluate audio-spatial memory abilities. The sound sources have been spatially distributed employing the 3D Tune-In Toolkit, a virtual acoustic simulator. The participants are presented with sequences of sounds of increasing length emitted from virtual auditory sources around their heads. To identify stimuli positions and register the test responses, we designed a custom-made interface with buttons arranged according to sound locations. We took inspiration from the Corsi-Block test for the experimental procedure, a validated clinical approach for assessing visuo-spatial memory abilities. In two different experimental sessions, the participants were tested with the classical Corsi-Block and, blindfolded, with the proposed task, named Audio-Corsi for brevity. Our results show comparable performance across the two tests in terms of the estimated memory parameter precision. Furthermore, in the Audio-Corsi we observe a lower span compared to the Corsi-Block test. We discuss these results in the context of the theoretical relationship between the auditory and visual sensory modalities and potential applications of this system in multiple scientific and clinical contexts.
Vickers D, Salorio-Corbetto M, Driver S, et al., 2021, Involving children and teenagers with bilateral cochlear implants in the design of the BEARS (Both EARS) virtual reality training suite improves personalization, Frontiers in Digital Health, Vol: 3, ISSN: 2673-253X
Older children and teenagers with bilateral cochlear implants often have poor spatial hearing because they cannot fuse sounds from the two ears. This deficit jeopardizes speech and language development, education, and social well-being. The lack of protocols for fitting bilateral cochlear implants and resources for spatial-hearing training contribute to these difficulties. Spatial hearing develops with bilateral experience. A large body of research demonstrates that sound localisation can improve with training, underpinned by plasticity-driven changes in the auditory pathways. Generalizing training to non-trained auditory skills is best achieved by using a multi-modal (audio-visual) implementation and multi-domain training tasks (localisation, speech-in-noise, and spatial music). The goal of this work was to develop a package of virtual-reality games (BEARS, Both EARS) to train spatial hearing in young people (8–16 years) with bilateral cochlear implants using an action-research protocol. The action research protocol used formalized cycles for participants to trial aspects of the BEARS suite, reflect on their experiences, and in turn inform changes in the game implementations. This participatory design used the stakeholder participants as co-creators. The cycles for each of the three domains (localisation, spatial speech-in-noise, and spatial music) were customized to focus on the elements that the stakeholder participants considered important. The participants agreed that the final games were appropriate and ready to be used by patients. The main areas of modification were: the variety of immersive scenarios to cover age range and interests, the number of levels of complexity to ensure small improvements were measurable, feedback, and reward schemes to ensure positive reinforcement, and an additional implementation on an iPad for those who had difficulties with the headsets due to age or balance issues. The effectiveness of the BEARS training suite will be ev
Setti W, Cuturi LF, Engel I, et al., 2021, The Influence of Early Visual Deprivation on Audio-Spatial Working Memory, NEUROPSYCHOLOGY, Vol: 36, Pages: 55-63, ISSN: 0894-4105
- Author Web Link
- Cite
- Citations: 3
Engel Alonso Martinez J, Goodman DFM, Picinali L, 2021, Improving Binaural Rendering with Bilateral Ambisonics and MagLS, DAGA 2021
Heath R, Orme DS, Sethi CSL, et al., 2021, How index selection, compression, and recording schedule impact the description of ecological soundscapes, Evolutionary Ecology, Vol: 11, Pages: 13206-13217, ISSN: 0269-7653
Acoustic indices derived from environmental soundscape recordings are being used to monitor ecosystem health and vocal animal biodiversity. Soundscape data can quickly become very expensive and difficult to manage, so data compression or temporal down-sampling are sometimes employed to reduce data storage and transmission costs. These parameters vary widely between experiments, with the consequences of this variation remaining mostly unknown.We analyse field recordings from North-Eastern Borneo across a gradient of historical land use. We quantify the impact of experimental parameters (MP3 compression, recording length and temporal subsetting) on soundscape descriptors (Analytical Indices and a convolutional neural net derived AudioSet Fingerprint). Both descriptor types were tested for their robustness to parameter alteration and their usability in a soundscape classification task.We find that compression and recording length both drive considerable variation in calculated index values. However, we find that the effects of this variation and temporal subsetting on the performance of classification models is minor: performance is much more strongly determined by acoustic index choice, with Audioset fingerprinting offering substantially greater (12%–16%) levels of classifier accuracy, precision and recall.We advise using the AudioSet Fingerprint in soundscape analysis, finding superior and consistent performance even on small pools of data. If data storage is a bottleneck to a study, we recommend Variable Bit Rate encoded compression (quality = 0) to reduce file size to 23% file size without affecting most Analytical Index values. The AudioSet Fingerprint can be compressed further to a Constant Bit Rate encoding of 64 kb/s (8% file size) without any detectable effect. These recommendations allow the efficient use of restricted data storage whilst permitting comparability of results between different studies.
Heath R, Sethi S, Picinali L, et al., 2021, How index selection, compression, and recording schedule impact the description of ecological soundscapes, Ecology and Evolution, ISSN: 2045-7758
Lim V, Khan S, Picinali L, 2021, Towards a more accessible cultural heritage: challenges and opportunities in contextualization using 3D sound narratives, Applied Sciences, ISSN: 2076-3417
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.