Imperial College London


Faculty of EngineeringDyson School of Design Engineering

Reader in Audio Experience Design



l.picinali Website CV




Level 1 staff officeDyson BuildingSouth Kensington Campus





Publication Type

112 results found

Hogg A, Jenkins M, Liu H, Squires I, Cooper S, Picinali Let al., 2024, HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection, IEEE Transactions on Audio, Speech and Language Processing, ISSN: 1558-7916

An individualised (HRTF) is very important for creating realistic (VR) and (AR) environments. However, acoustically measuring high-quality HRTFs requires expensive equipment and an acoustic lab setting. To overcome these limitations and to make this measurement more efficient HRTF upsampling has been exploited in the past where a high-resolution HRTF is created from a low-resolution one. This paper demonstrates how (GAN) can be applied to HRTF upsampling. We propose a novel approach that transforms the HRTF data for direct use with a convolutional (SRGAN). This new approach is benchmarked against three baselines: barycentric upsampling, (SH) upsampling and an HRTF selection approach. Experimental results show that the proposed method outperforms all three baselines in terms of (LSD) and localisation performance using perceptual models when the input HRTF is sparse (less than 20 measured positions).

Journal article

Chiara V, Sara C, Kevin S, Livio F, Francesco P, Picinali Let al., 2024, Spatial hearing training in virtual reality with simulated asymmetric hearing loss, Scientific Reports, Vol: 14, ISSN: 2045-2322

Sound localization is essential to perceive the surrounding world and to interact with objects. This ability can be learned across time, and multisensory and motor cues play a crucial role in the learning process. A recent study demonstrated that when training localization skills, reaching to the sound source to determine its position reduced localization errors faster and to a greater extent as compared to just naming sources’ positions, despite the fact that in both tasks, participants received the same feedback about the correct position of sound sources in case of wrong response. However, it remains to establish which features have made reaching to sound more effective as compared to naming. In the present study, we introduced a further condition in which the hand is the effector providing the response, but without it reaching toward the space occupied by the target source: the pointing condition. We tested three groups of participants (naming, pointing, and reaching groups) each while performing a sound localization task in normal and altered listening situations (i.e. mild-moderate unilateral hearing loss) simulated through auditory virtual reality technology. The experiment comprised four blocks: during the first and the last block, participants were tested in normal listening condition, while during the second and the third in altered listening condition. We measured their performance, their subjective judgments (e.g. effort), and their head-related behavior (through kinematic tracking). First, people’s performance decreased when exposed to asymmetrical mild-moderate hearing impairment, more specifically on the ipsilateral side and for the pointing group. Second, we documented that all groups decreased their localization errors across altered listening blocks, but the extent of this reduction was higher for reaching and pointing as compared to the naming group. Crucially, the reaching group leads to a greater error reduction for the side where th

Journal article

González-Toledo D, Cuevas-Rodríguez M, Vicente T, Picinali L, Molina-Tanco L, Reyes-Lecuona Aet al., 2024, Spatial release from masking in the median plane with non-native speakers using individual and mannequin head related transfer functions., J Acoust Soc Am, Vol: 155, Pages: 284-293

Spatial release from masking (SRM) in speech-on-speech tasks has been widely studied in the horizontal plane, where interaural cues play a fundamental role. Several studies have also observed SRM for sources located in the median plane, where (monaural) spectral cues are more important. However, a relatively unexplored research question concerns the impact of head-related transfer function (HRTF) personalisation on SRM, for example, whether using individually-measured HRTFs results in better performance if compared with the use of mannequin HRTFs. This study compares SRM in the median plane in a speech-on-speech virtual task rendered using both individual and mannequin HRTFs. SRM is obtained using English sentences with non-native English speakers. Our participants show lower SRM performances compared to those found by others using native English participants. Furthermore, SRM is significantly larger when the source is spatialised using the individual HRTF, and this effect is more marked for those with lower English proficiency. Further analyses using a spectral distortion metric and the estimation of the better-ear effect, show that the observed SRM can only partially be explained by HRTF-specific factors and that the effect of the familiarity with individual spatial cues is likely to be the most significant element driving these results.

Journal article

Hogg A, Liu H, Mads J, Picinali Let al., 2023, Exploring the Impact of Transfer Learning on GAN-Based HRTF Upsampling, EAA Forum Acusticum, European Congress on Acoustics

Conference paper

Meyer J, Picinali L, 2023, Comparison of simulated head-related transfer functions accuracy for different model complexities using the finite-difference time-domain method, Forum Acusticum 2023, Pages: 131-137, ISSN: 2221-3767

The use of finite-difference time-domain (FDTD) simula-tions is relevant for several applications in virtual acous-tics. One of these is the numerical calculation of head-related transfer functions (HRTFs). This study investi-gates the effect of varying the geometrical complexity(shape, level of details) of a human head/torso model onthe calculation of its HRTFs using an FDTD solver. Inparticular, the interest is on the accuracy of the obtainedsimulation results with respect to the human head/torsomodel complexity. For that aim, a solution verificationprocess is undertaken, and a single sphere, a two-sphereand a human head and torso models are considered. Theresults indicate that relatively small 95% confidence in-tervals on the solution verification results are achieved,indicating relatively good accuracy for the prediction ofHRTFs up to relatively high frequencies for the single andtwo-sphere models considered. However, for the simpli-fied human head and torso model, a similar accuracy isachieved only up to a lower frequency.

Conference paper

Valzolgher C, Capra S, Pavani F, Picinali Let al., 2023, Training spatial hearing skills in virtual reality through a sound-reaching task, Forum Acusticum 2023

Sound localization is crucial for interacting with thesurrounding world. This ability can be learned across timeand improved by multisensory and motor cues. In the lastdecade, studying the contributions of multisensory andmotor cues has been facilitated by the increased adoption ofvirtual reality (VR). In a recent study, sound localizationhad been trained through a task where the visual stimuliwere rendered through a VR headset, and the auditory onesthrough a loudspeaker moved around by the experimenter.Physically reaching to sound sources reduced soundlocalization errors faster and to a greater extent if comparedto naming sources’ positions. Interestingly, training efficacyextended also to hearing-impaired people. Yet, thisapproach is unfeasible for rehabilitation at home. Fullyvirtual approaches have been used to study spatial hearinglearning processes, performing headphones-renderedacoustic simulations. In the present study, we investigatewhether the effects of our reaching-based training can beobserved when taking advantage of such simulations,showing that the improvement is comparable between thefull-VR and blended VR conditions. This validates the useof training paradigms that are completely based on portableequipment and don’t require an external operator, openingnew perspectives in the field of remote rehabilitation.

Conference paper

Picinali L, Grimm G, Hioka Y, Kearney G, Johnston D, Jin C, Simon LSR, Wuthrich H, Mihocic M, Majdak P, Vickers Det al., 2023, VR/AR and hearing research: current examples and future challenges, Forum Acusticum 2023

A well-known issue in clinical audiology and hearing research is the level of abstraction of traditional experimental assessments and methods, which lack ecological validity and differ significantly from real-life experiences,often resulting in unreliable outcomes. Attempts to dealwith this matter by, for example, performing experimentsin real-life contexts, can be problematic due to the difficulty of accurately identifying control-specific parametersand events. Virtual and augmented reality (VR/AR) havethe potential to provide dynamic and immersive audiovisual experiences that are at the same time realistic andhighly controllable. Several successful attempts have beenmade to create and validate VR-based implementationsof standard audiological and linguistic tests, as well asto design procedures and technologies to assess meaningful and ecologically-valid data. Similarly, new viewpoints on auditory perception have been provided by looking at hearing training and auditory sensory augmentation, aiming at improving perceptual skills in tasks suchas speech understanding and sound-source localisation. In this contribution, we bring together researchers active inthis domain. We briefly describe experiments they havedesigned, and jointly identify challenges that are still openand common approaches to tackle them

Conference paper

Giraud P, Sum K, Pontoppidan NH, Poole K, Picinali Let al., 2023, Adaptation to altered interaural time differences in a virtual reality environment, Forum Acusticum 2023

Interaural time differences (ITDs) are important cues fordetermining the azimuth location of a sound source andneed to be accurately reproduced, in a virtual reality (VR)environment, to achieve a realistic sense of sound location for the listener. ITDs are usually included in headrelated transfer functions (HRTFs) used for audio rendering, and can be individualised to match the user’s headsize (e.g. longer ITDs are needed for larger head sizes).In recent years, studies have shown that it is possibleto train subjects to adapt and improve their performancein sound localisation skills to non-individualized HRTFs.The analysis of such improvements has focused mainlyon adaptation to monoaural spectral cues rather than binaural cues such as ITDs. In this work listeners are placedin a VR environment and are asked to localise the sourceof a noise burst in the horizontal plane. Using a genericnon-individualized HRTF with its ITD modified to matchthe head size of each participant, test and training phasesare alternated, with the latter providing continuous auditory feedback. The experiment is then repeated with ITDssimulating larger (150%) and smaller (50%) head sizes.Comparing localisation accuracy before and after training, it is observed that while training seems to improvesound localisation performance, this varies according tothe simulated head size and target location.

Conference paper

Chard I, Van Zalk N, Picinali L, 2023, Virtual reality exposure therapy for reducing social anxiety associated with stuttering: the role of outcome expectancy, therapeutic alliance, presence and social presence, Frontiers in Virtual Reality, Vol: 4, Pages: 1-15, ISSN: 2673-4192

Introduction: Although several trials have demonstrated the effectiveness of Virtual Reality Exposure Therapy (VRET) for reducing social anxiety, there is little understanding about the factors that lead to symptom reduction across different treatment designs. Such factors may include outcome expectancy, therapeutic alliance, presence (perception of being in the virtual environment) and social presence (perception of interacting with others). We report on findings from a pilot trial of VRET targeting social anxiety in people who stutter, and examine the association of these four factors with treatment outcome.Methods: People who stutter reporting heightened social anxiety (n = 22) took part in the trial after being recruited via online adverts. Remotely delivered VRET was administered to participants in three sessions across three weeks. Each session targeted both performative and interactive anxiety. A virtual therapist helped participants to engage with treatment strategies, whilst also guiding them through exercises.Results: Findings showed that presence and social presence were both negatively related to changes in fear of negative evaluation between pre- and post-treatment. However, presence, outcome expectancy and therapeutic alliance were positively related to changes in social anxiety symptoms. Furthermore, outcome expectancy and therapeutic alliance were quadratically related to fear of negative evaluation change. Nevertheless, the effect of presence on social anxiety, and the effects of presence and therapeutic alliance on fear of negative evaluation must be interpreted with caution as these were not large enough to reach sufficient statistical power. Therapeutic alliance did not mediate the relationship between outcome expectancy and treatment outcome.Discussion: These findings suggest that the current VRET protocol affected social anxiety and fear of negative evaluation differently. We discuss how presence may underlie these mixed associations. We also s

Journal article

Reyes-Lecuona A, Cuevas-Rodríguez M, González-Toledo D, Molina-Tanco L, Poirier-Quinot D, Picinali Let al., 2023, Hearing loss and hearing aid simulations for accessible user experience

This paper presents an open-source real-Time hearing loss and hearing aids simulator implemented within the 3D Tune-In Toolkit C++ library. These simulators provide a valuable tool for improving auditory accessibility, promoting inclusivity and foster new research. The hearing loss simulator accurately simulates various types and levels of hearing loss, while the hearing aid simulator replicates different hearing aid technologies, allowing for the simulation of real-world hearing aid experiences. Both simulators are implemented to work in real-Time, allowing for immediate feedback and adjustment during testing and development. As an open-source tool, the simulators can be customised and modified to meet specific needs, and the scientific community can collaborate and improve upon the algorithms. The technical details of the simulators and their implementation in the C++ library are presented, and the potential applications of the simulators are discussed, showing that they can be used as a valuable support software for UX designers to ensure the accessibility of their products to individuals with hearing impairment. Moreover, these simulators can be used to raise awareness about auditory accessibility issues. Overall, this paper also aims to provide some insight into the development and implementation of accessible technology for individuals with hearing impairments.

Conference paper

Martin V, Picinali L, 2023, Comparing online vs. lab-based experimental approaches for the perceptual evaluation of artificial reverberation, Forum Acusticum 2023

A common approach for reproducing room acoustics effects is geometrical acoustics. The accuracy of such anapproach is tied, among other variables, to the geometrical accuracy of the simulated room, and to the information regarding the absorption coefficients of its materials.However, from a perceptual standpoint, a model that accounts for all of a room’s features would come at a highcomputational cost and could be redundant. As a result, acompromise can be reached between the perceived quality (e.g. authenticity, immersion, etc.) of the replicatedroom effect and the model’s complexity. The purpose ofthis study is to look into the perceptual impact of simplifying the room geometry and minimizing the numberof materials’ absorption coefficients. Two separate experiments were conducted, both based on the MUSHRAmethodology: one was run in a controlled lab environment through a Virtual Reality (VR) headset, while theother was run through a web-based interface. This paperfocuses on the differences between the two protocols’ impact on the results. It appears that the online-based experiment, notwithstanding the lack of control of the playback system and environment, and the participants’ likely limited attention, produced minor but substantial differences with the results of the VR experiment.

Conference paper

Daugintis R, Barumerli R, Geronazzo M, Picinali Let al., 2023, Initial evaluation of an auditory-model-aided selection procedure for non-individual HRTFs, Forum Acusticum, Pages: 1-8, ISSN: 2221-3767

Binaural spatial audio reproduction systems use measuredor simulated head-related transfer functions (HRTFs),which encode the effects of the outer ear and body onthe incoming sound to recreate a realistic spatial auditoryfield around the listener. The sound localisation cues embedded in the HRTF are highly personal. Establishingperceptual similarity between different HRTFs in a reliable manner is challenging due to a combination of acoustic and non-acoustic aspects affecting our spatial auditoryperception. To account for these factors, we propose anautomated procedure to select the ‘best’ non-individualHRTF dataset from a pool of measured ones. For a groupof human participants with their own acoustically measured HRTFs, a multi-feature Bayesian auditory sound localisation model is used to predict individual localisationperformance with the other HRTFs from within the group.Then, the model selection of the ‘best’ and the ‘worst’non-individual HRTFs is evaluated via an actual localisation test and a subjective audio quality assessment in comparison with individual HRTFs. A successful model-aidedobjective selection of the ‘best’ non-individual HRTF mayprovide relevant insights for effective and handy binaural spatial audio solutions in virtual/augmented reality(VR/AR) applications.

Conference paper

Setti W, Vitali H, Campus C, Picinali L, W MGSet al., 2023, Audio-Corsi: a novel system to evaluate audio-spatial memory skills., Annu Int Conf IEEE Eng Med Biol Soc, Vol: 2023, Pages: 1-4

Spatial memory (SM) is a multimodal representation of the external world, which different sensory inputs can mediate. It is essential in accomplishing everyday activities and strongly correlates with sleep processes. However, despite valuable knowledge of the spatial mechanisms in the visual modality, the multi-sensory aspects of SM have yet to be thoroughly investigated due to a lack of proper technologies.This work presents a novel acoustic system built around 3D audio spatial technology. Our goal was to examine if an afternoon nap can improve memory performance, measured through the acoustic version of the Corsi Block Tapping Task (CBTT), named Audio-Corsi. We tested five adults over two days. During one of the two days (Wake), participants performed the Audio-Corsi before (Pre) and after (Post) a wake resting period; while the other day (Sleep), participants performed the Audio-Corsi before (Pre) and after (Post) a nap. Day orders were randomized. We calculated the memory span for the Pre and Post session in both the Wake and Sleep days. Preliminary results show a significant difference in the memory span between the Wake and Sleep days. Specifically, memory span decreased between the pre-and post-test during the wake day. The opposite trend was found for the sleep day. Results indicate that SM can be improved by sleeping also in the acoustic modality other than the visual one.Clinical Relevance- The technology and procedure we designed and developed could be suitable in clinical and experimental settings to study high-level cognitive skills in the auditory sensory modality and their relationship with sleep, especially when vision is absent or distorted (i.e. blindness).

Journal article

Daugintis R, Barumerli R, Picinali L, Geronazzo Met al., 2023, Classifying non-individual head-related transfer functions with a computational auditory model: calibration and metrics, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 1-5

This study explores the use of a multi-feature Bayesian auditory sound localisation model to classify non-individual head-related transfer functions (HRTFs). Based on predicted sound localisation performance, these are grouped into ‘good’ and ‘bad’, and the ‘best’/‘worst’ is selected from each category. Firstly, we present a greedy algorithm for automated individual calibration of the model based on the individual sound localisation data. We then discuss data analysis of predicted directional localisation errors and present an algorithm for categorising the HRTFs based on the localisation error distributions within a limited range of directions in front of the listener. Finally, we discuss the validity of the classification algorithm when using averaged instead of individual model parameters. This analysis of auditory modelling results aims to provide a perceptual foundation for automated HRTF personalisation techniques for an improved experience of binaural spatial audio technologies.

Conference paper

Pauwels J, Picinali L, 2023, On the relevance of the differences between HRTF measurement setups for machine learning, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 1-5

As spatial audio is enjoying a surge in popularity, data-driven machine learning techniques that have been proven successful in other domains are increasingly used to process head-related transfer function measurements. However, these techniques require much data, whereas the existing datasets are ranging from tens to the low hundreds of datapoints. It therefore becomes attractive to combine multiple of these datasets, although they are measured under different conditions. In this paper, we first establish the common ground between a number of datasets, then we investigate potential pitfalls of mixing datasets. We perform a simple experiment to test the relevance of the remaining differences between datasets when applying machine learning techniques. Finally, we pinpoint the most relevant differences.

Conference paper

Pedersen RL, Picinali L, Kajs N, Patou Fet al., 2023, Virtual-Reality-Based Research in Hearing Science: A Platforming Approach, AES: Journal of the Audio Engineering Society, Vol: 71, Pages: 374-389, ISSN: 1549-4950

The lack of ecological validity in clinical assessment, as well as the challenge of investigating multimodal sensory processing, remain key challenges in hearing science. Virtual Reality (VR) can support hearing research in these domains by combining experimental control with situational realism. However, the development of VR-based experiments is traditionally highly resource demanding, which places a significant entry barrier for basic and clinical researchers looking to embrace VR as the research tool of choice. The Oticon Medical Virtual Reality (OMVR) experiment platform fast-tracks the creation or adaptation of hearing research experiment templates to be used to explore areas such as binaural spatial hearing, multimodal sensory integration, cognitive hearing behavioral strategies, auditory-visual training, etc. In this paper, the OMVR’s functionalities, architecture, and key elements of implementation are presented, important performance indicators are characterized, and a use-case perceptual evaluation is presented.

Journal article

Engel I, Daugintis R, Vicente T, Hogg AOT, Pauwels J, Tournier AJ, Picinali Let al., 2023, The SONICOM HRTF dataset, Journal of the Audio Engineering Society, Vol: 71, Pages: 241-253, ISSN: 0004-7554

Immersive audio technologies, ranging from rendering spatialized sounds accurately to efficient room simulations, are vital to the success of augmented and virtual realities. To produce realistic sounds through headphones, the human body and head must both be taken into account. However, the measurement of the influence of the external human morphology on the sounds incoming to the ears, which is often referred to as head-related transfer function (HRTF), is expensive and time-consuming. Several datasets have been created over the years to help researcherswork on immersive audio; nevertheless, the number of individuals involved and amount of data collected is often insufficient for modern machine-learning approaches. Here, the SONICOM HRTF dataset is introduced to facilitate reproducible research in immersive audio. This dataset contains the HRTF of 120 subjects, as well as headphone transfer functions; 3D scans of ears, heads, and torsos; and depth pictures at different angles around subjects' heads.

Journal article

Chard I, van Zalk N, Picinali L, 2023, Virtual reality exposure therapy for reducing social anxiety in stuttering: a randomized controlled pilot trial, Frontiers in Digital Health, Vol: 5, Pages: 1-14, ISSN: 2673-253X

We report on findings from the first randomized controlled pilot trial of virtual reality exposure therapy (VRET) developed specifically for reducing social anxiety associated with stuttering. People who stutter with heightened social anxiety were recruited from online adverts and randomly allocated to receive VRET (n = 13) or be put on a waitlist (n = 12). Treatment was delivered remotely using a smartphone-based VR headset. It consisted of three weekly sessions, each comprising both performative and interactive exposure exercises, and was guided by a virtual therapist. Multilevel model analyses failed to demonstrate the effectiveness of VRET at reducing social anxiety between pre- and post-treatment. We found similar results for fear of negative evaluation, negative thoughts associated with stuttering, and stuttering characteristics. However, VRET was associated with reduced social anxiety between post-treatment and one-month follow-up. These pilot findings suggest that our current VRET protocol may not be effective at reducing social anxiety amongst people who stutter, though might be capable of supporting longer-term change. Future VRET protocols targeting stuttering-related social anxiety should be explored with larger samples. The results from this pilot trial provide a solid basis for further design improvements and for future research to explore appropriate techniques for widening access to social anxiety treatments in stuttering.

Journal article

Picinali L, Katz BFG, Geronazzo M, Majdak P, Reyes-Lecuona A, Vinciarelli Aet al., 2022, The SONICOM Project: artificial intelligence-driven immersive audio, from personalization to modeling, IEEE: Signal Processing Magazine, Vol: 39, Pages: 85-88, ISSN: 1053-5888

Every individual perceives spatial audio differently, due in large part to the unique and complex shape of ears and head. Therefore, high-quality, headphone-based spatial audio should be uniquely tailored to each listener in an effective and efficient manner. Artificial intelligence (AI) is a powerful tool that can be used to drive forward research in spatial audio personalization. The SONICOM project aims to employ a data-driven approach that links physiological characteristics of the ear to the individual acoustic filters, which allows us to localize sound sources and perceive them as being located around us. A small amount of data acquired from users could allow personalized audio experiences, and AI could facilitate this by offering a new perspective on the matter. A Bayesian approach to computational neuroscience and binaural sound reproduction will be linked to create a metric for AI-based algorithms that will predict realistic spatial audio quality. Being able to consistently and repeatedly evaluate and quantify the improvements brought by technological advancements, as well as the impact these have on complex interactions in virtual environments, will be key for the development of new techniques and for unlocking new approaches to understanding the mechanisms of human spatial hearing and communication.

Journal article

Picinali L, Katz BFG, 2022, System-to-user and user-to-system adaptations in Binaural audio, Sonic Interactions in Virtual Environments, Editors: Geronazzo, Serafin, Publisher: Springer, Pages: 115-143

This chapter concerns concepts of adaption in a binaural audio context (i.e. headphone-based three-dimensional audio rendering and associated spatial hearing aspects), considering first the adaptation of the rendering system to the acoustic and perceptual properties of the user, and second the adaptation of the user to the rendering quality of the system. We start with an overview of the basic mechanisms of human sound source localisation, introducing expressions such as localisation cues and interaural differences, and the concept of the Head-Related Transfer Function (HRTF), which is the basis of most 3D spatialisation systems in VR. The chapter then moves to more complex concepts and processes, such as HRTF selection (system-to-user adaptation) and HRTF accommodation (user-to-system adaptation). State-of-the-art HRTF modelling and selection methods are presented, looking at various approaches and at how these have been evaluated. Similarly, the process of HRTF accommodation is detailed, with a case study employed as an example. Finally, the potential of these two approaches are discussed, considering their combined use in a practical context, as well as introducing a few open challenges for future research.

Book chapter

Geronazzo M, Serafin S, 2022, Sonic interactions in virtual environments, Publisher: Springer, ISBN: 9783031040207

This book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are:Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologiesSonic interaction: the human-computer interplay through auditory feedback in VEVR systems: naturally support multimodal integration, impacting different application domainsSonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond.Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments.


Siripornpitak P, Engel I, Cooper S, Squires I, Picinali Let al., 2022, Spatial up-sampling of HRTF sets using generative adversarial networks: a pilot study, Frontiers in Signal Processing, Vol: 2, Pages: 1-10, ISSN: 2673-8198

Headphone-based spatial audio simulations rely on Head Related Transfer Functions (HRTFs) in order to reconstruct the sound field at the entrance of the listener’s ears. A HRTF is strongly dependent on the listener’s specific anatomical structures, and it has been shown that virtual sounds recreated with someone else’s HRTF result in worse localisation accuracy, as well as altering other subjective measures such as externalisation and realism. Acoustic measurements of the filtering effects generated by ears, head and torso has proven to be one of the most reliable ways to obtain a personalised HRTF. However this requires a dedicated and expensive setup, and is time-intensive. In order to simplify the measurement setup, thereby improving the scalability of the process, we are exploring strategies to reduce the number of acoustic measurements without degrading the spatial resolution of the HRTF. Traditionally, spatial up sampling of HRTF sets is achieved through barycentric interpolation or by employing the spherical harmonics framework. However, such methods often perform poorly when the provided HRTF data is spatially very sparse. This work investigates the use of generative adversarial networks (GANs) to tackle the up-sampling problem, offering an initial insight about the suitability of this technique. Numerical evaluations based on spectral magnitude error and perceptual model outputs are presented on single spatial dimensions, therefore considering sources positioned only in one of the three main planes: horizontal, median, and frontal. Results suggest that traditional HRTF interpolation methods perform better than the proposed GAN-based one when the distance between measurements is smaller than 90°, but for the sparsest conditions (i.e. one measurement every 120° to 180°), the proposed approach outperforms the others.

Journal article

Reyes-Lecuona A, Bouchara T, Picinali L, 2022, Immersive sound for XR, Roadmapping Extended Reality: Fundamentals and Applications, Pages: 75-102, ISBN: 9781119865148

Sound plays a very important role in everyday life as well as in XR applications, as it will be explained in this chapter. Recent advances and challenges in immersive audio research are presented, discussing how, why, and to which extent there is potential for further development of these technologies applied to XR. The fundamentals of immersive audio rendering for XR are introduced before presenting the main technological challenges still open in the area. Finally, a series of future applications is presented, which the authors envision being examples of the potential of immersive audio in XR, and a research roadmap is outlined.

Book chapter

Comunita M, Gerino A, Picinali L, 2022, PlugSonic: a web- and mobile-based platform for dynamic and navigable binaural audio, Eurasip Journal on Audio, Speech, and Music Processing, Vol: 18, ISSN: 1687-4714

PlugSonic is a series of web- and mobilebased applications designed to: edit samples and applyaudio effects (PlugSonic Sample), create and experience dynamic and navigable soundscapes and sonic narratives (PlugSonic Soundscape). The audio processingwithin PlugSonic is based on the Web Audio API whilethe binaural rendering uses the 3D Tune-In Toolkit.Exploration of soundscapes in a physical space is madepossible by adopting Apple’s ARKit. The present paperdescribes the implementation details, the signal processing chain and the necessary steps to curate and experience a soundscape. We also include some metricsand performance details. The main goal of PlugSonic isto give users a complete set of tools, without the needfor specific devices, external software and/or hardware,specialised knowledge or custom development; with theidea that spatial audio has the potential to become areadily accessible and easy to understand technology,for anyone to adopt, whether for creative or researchpurposes.

Journal article

Saksida A, Ghiselli S, Picinali L, Pintonello S, Battelino S, Orzan Eet al., 2022, Attention to speech and music in young children with bilateral cochlear implants: a pupillometry study, The Journal of Clinical Sleep Medicine, Vol: 11, Pages: 1-14, ISSN: 1550-9389

Early bilateral cochlear implants (CIs) may enhance attention to speech, and reduce cognitive load in noisy environments. However, it is sometimes difficult to measure speech perception and listening effort, especially in very young children. Behavioral measures cannot always be obtained in young/uncooperative children, whereas objective measures are either difficult to assess or do not reliably correlate with behavioral measures. Recent studies have thus explored pupillometry as a possible objective measure. Here, pupillometry is introduced to assess attention to speech and music in noise in very young children with bilateral CIs (N = 14, age: 17–47 months), and in the age-matched group of normally-hearing (NH) children (N = 14, age: 22–48 months). The results show that the response to speech was affected by the presence of background noise only in children with CIs, but not NH children. Conversely, the presence of background noise altered pupil response to music only in in NH children. We conclude that whereas speech and music may receive comparable attention in comparable listening conditions, in young children with CIs, controlling for background noise affects attention to speech and speech processing more than in NH children. Potential implementations of the results for rehabilitation procedures are discussed.

Journal article

Engel Alonso Martinez J, Goodman D, Picinali L, 2022, Assessing HRTF preprocessing methods for Ambisonics rendering through perceptual models, Acta Acustica -Peking-, Vol: 6, ISSN: 0371-0025

Binaural rendering of Ambisonics signals is a common way to reproduce spatial audio content. Processing Ambisonics signals at low spatial orders is desirable in order to reduce complexity, although it may degrade the perceived quality, in part due to the mismatch that occurs when a low-order Ambisonics signal is paired with a spatially dense head-related transfer function (HRTF). In order to alleviate this issue, the HRTF may be preprocessed so its spatial order is reduced. Several preprocessing methods have been proposed, but they have not been thoroughly compared yet. In this study, nine HRTF preprocessing methods were used to render anechoic binaural signals from Ambisonics representations of orders 1 to 44, and these were compared through perceptual hearing models in terms of localisation performance, externalisation and speech reception. This assessment was supported by numerical analyses of HRTF interpolation errors, interaural differences, perceptually-relevant spectral differences, and loudness stability. Models predicted that the binaural renderings’ accuracy increased with spatial order, as expected. A notable effect of the preprocessing method was observed: whereas all methods performed similarly at the highest spatial orders, some were considerably better at lower orders. A newly proposed method, BiMagLS, displayed the best performance overall and is recommended for the rendering of bilateral Ambisonics signals. The results, which were in line with previous literature, indirectly validate the perceptual models’ ability to predict listeners’ responses in a consistent and explicable manner.

Journal article

Salorio-Corbetto M, Williges B, Lamping W, Picinali L, Vickers Det al., 2022, Evaluating spatial hearing using a dual-task approach in a virtual-acoustics environment, Frontiers in Neuroscience, Vol: 16, Pages: 1-17, ISSN: 1662-453X

Spatial hearing is critical for communication in everyday sound-rich environments. It is important to gain an understanding of how well users of bilateral hearing devices function in these conditions. The purpose of this work was to evaluate a Virtual Acoustics (VA) version of the Spatial Speech in Noise (SSiN) test, the SSiN-VA. This implementation uses relatively inexpensive equipment and can be performed outside the clinic, allowing for regular monitoring of spatial-hearing performance. The SSiN-VA simultaneously assesses speech discrimination and relative localization with changing source locations in the presence of noise. The use of simultaneous tasks increases the cognitive load to better represent the difficulties faced by listeners in noisy real-world environments. Current clinical assessments may require costly equipment which has a large footprint. Consequently, spatial-hearing assessments may not be conducted at all. Additionally, as patients take greater control of their healthcare outcomes and a greater number of clinical appointments are conducted remotely, outcome measures that allow patients to carry out assessments at home are becoming more relevant. The SSiN-VA was implemented using the 3D Tune-In Toolkit, simulating seven loudspeaker locations spaced at 30° intervals with azimuths between −90° and +90°, and rendered for headphone playback using the binaural spatialization technique. Twelve normal-hearing participants were assessed to evaluate if SSiN-VA produced patterns of responses for relative localization and speech discrimination as a function of azimuth similar to those previously obtained using loudspeaker arrays. Additionally, the effect of the signal-to-noise ratio (SNR), the direction of the shift from target to reference, and the target phonetic contrast on performance were investigated. SSiN-VA led to similar patterns of performance as a function of spatial location compared to loudspeaker setups for both relative lo

Journal article

Setti W, Cuturi LF, Engel I, Picinali L, Gori Met al., 2022, The Influence of Early Visual Deprivation on Audio-Spatial Working Memory, NEUROPSYCHOLOGY, Vol: 36, Pages: 55-63, ISSN: 0894-4105

Journal article

Bürgel M, Picinali L, Siedenburg K, 2021, Listening in the mix: lead vocals robustly attract auditory attention in popular music, Frontiers in Psychology, Vol: 12, Pages: 1-15, ISSN: 1664-1078

Listeners can attend to and track instruments or singing voices in complex musical mixtures, even though the acoustical energy of sounds from individual instruments may overlap in time and frequency. In popular music, lead vocals are often accompanied by sound mixtures from a variety of instruments, such as drums, bass, keyboards, and guitars. However, little is known about how the perceptual organization of such musical scenes is affected by selective attention, and which acoustic features play the most important role. To investigate these questions, we explored the role of auditory attention in a realistic musical scenario. We conducted three online experiments in which participants detected single cued instruments or voices in multi-track musical mixtures. Stimuli consisted of 2-s multi-track excerpts of popular music. In one condition, the target cue preceded the mixture, allowing listeners to selectively attend to the target. In another condition, the target was presented after the mixture, requiring a more “global” mode of listening. Performance differences between these two conditions were interpreted as effects of selective attention. In Experiment 1, results showed that detection performance was generally dependent on the target’s instrument category, but listeners were more accurate when the target was presented prior to the mixture rather than the opposite. Lead vocals appeared to be nearly unaffected by this change in presentation order and achieved the highest accuracy compared with the other instruments, which suggested a particular salience of vocal signals in musical mixtures. In Experiment 2, filtering was used to avoid potential spectral masking of target sounds. Although detection accuracy increased for all instruments, a similar pattern of results was observed regarding the instrument-specific differences between presentation orders. In Experiment 3, adjusting the sound level differences between the targets reduced the effect of

Journal article

Sethi SS, Ewers RM, Jones NS, Sleutel J, Shabrani A, Zulkifli N, Picinali Let al., 2021, Soundscapes predict species occurrence in tropical forests, OIKOS, Vol: 2022, Pages: 1-9, ISSN: 0030-1299

Accurate occurrence data is necessary for the conservation of keystone or endangered species, but acquiring it is usually slow, laborious and costly. Automated acoustic monitoring offers a scalable alternative to manual surveys but identifying species vocalisations requires large manually annotated training datasets, and is not always possible (e.g. for lesser studied or silent species). A new approach is needed that rapidly predicts species occurrence using smaller and more coarsely labelled audio datasets. We investigated whether local soundscapes could be used to infer the presence of 32 avifaunal and seven herpetofaunal species in 20 min recordings across a tropical forest degradation gradient in Sabah, Malaysia. Using acoustic features derived from a convolutional neural network (CNN), we characterised species indicative soundscapes by training our models on a temporally coarse labelled point-count dataset. Soundscapes successfully predicted the occurrence of 34 out of the 39 species across the two taxonomic groups, with area under the curve (AUC) metrics from 0.53 up to 0.87. The highest accuracies were achieved for species with strong temporal occurrence patterns. Soundscapes were a better predictor of species occurrence than above-ground carbon density – a metric often used to quantify habitat quality across forest degradation gradients. Our results demonstrate that soundscapes can be used to efficiently predict the occurrence of a wide variety of species and provide a new direction for data driven large-scale assessments of habitat suitability.

Journal article

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00851530&limit=30&person=true