Imperial College London

Dr Patrick A. Naylor

Faculty of EngineeringDepartment of Electrical and Electronic Engineering

Professor of Speech & Acoustic Signal Processing



+44 (0)20 7594 6235p.naylor Website




803Electrical EngineeringSouth Kensington Campus






BibTex format

author = {Hogg, A and Naylor, P and Evers, C},
doi = {10.1109/ICASSP.2019.8682924},
publisher = {IEEE},
title = {Speaker change detection using fundamental frequency with application to multi-talker segmentation},
url = {},
year = {2019}

RIS format (EndNote, RefMan)

AB - This paper shows that time varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. First a study is conducted to verify that changes in pitch are strong indicators of changes in the speaker. It is then highlighted that an individual’s pitch is smoothly varying and, therefore, can be predicted by means of a Kalman filter. Subsequently it is shown that if the pitch is not predictable then this is most likely due to a change in the speaker. Finally, a novel system is proposed that uses this approach of pitch prediction for speaker change detection. This system is then evaluated against a commonly used MFCC segmentation system. The proposed system is shown to increase the speaker change detection rate from 43.3% to 70.5% on meetings in the AMI corpus. Therefore, there are two equally weighted contributions in this paper: 1. We address the question of whether a change in pitch is a reliable estimator of a speaker change in multi-talk meeting audio. 2. We develop a method to extract such speaker changes and test them on a widely available meeting corpus.
AU - Hogg,A
AU - Naylor,P
AU - Evers,C
DO - 10.1109/ICASSP.2019.8682924
PY - 2019///
TI - Speaker change detection using fundamental frequency with application to multi-talker segmentation
UR -
UR -
UR -
ER -