TY - JOUR AB - This paper demonstrates how the harmonic structure of voiced speech can be exploited to segment multiple overlapping speakers in a speaker diarization task. We explore how a change in the speaker can be inferred from a change in pitch. We show that voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker’s utterance in the presence of an additional active speaker. This system is bench-marked against a segmentation system from the literature that employs a bidirectional long short term memory network (BLSTM) approach and requires training. Experimental results highlight that the proposed approach outperforms the BLSTM baseline approach by 12.9% in terms of HIT rate for speaker segmentation. We also show that the estimated pitch tracks of our system can be used as features to the BLSTM to achieve further improvements of 1.21% in terms of coverage and 2.45% in terms of purity AU - Hogg,A AU - Evers,C AU - Moore,A AU - Naylor,P DO - 10.1109/TASLP.2021.3067161 EP - 1490 PY - 2021/// SN - 2329-9290 SP - 1479 TI - Overlapping speaker segmentation using multiple hypothesis tracking of fundamental frequency T2 - IEEE/ACM Transactions on Audio, Speech and Language Processing UR - http://dx.doi.org/10.1109/TASLP.2021.3067161 UR - https://ieeexplore.ieee.org/document/9381673 UR - http://hdl.handle.net/10044/1/88508 VL - 29 ER -