TY  - JOUR
AB  - This paper demonstrates how the harmonic structure of voiced speech can be exploited to segment multiple overlapping speakers in a speaker diarization task. We explore how a change in the speaker can be inferred from a change in pitch. We show that voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker&rsquo;s utterance in the presence of an additional active speaker. This system is bench-marked against a segmentation system from the literature that employs a bidirectional long short term memory network (BLSTM) approach and requires training. Experimental results highlight that the proposed approach outperforms the BLSTM baseline approach by 12.9% in terms of HIT rate for speaker segmentation. We also show that the estimated pitch tracks of our system can be used as features to the BLSTM to achieve further improvements of 1.21% in terms of coverage and 2.45% in terms of purity
AU  - Hogg,A
AU  - Evers,C
AU  - Moore,A
AU  - Naylor,P
DO  - 10.1109/TASLP.2021.3067161
EP  - 1490
PY  - 2021///
SN  - 2329-9290
SP  - 1479
TI  - Overlapping speaker segmentation using multiple hypothesis tracking of fundamental frequency
T2  - IEEE/ACM Transactions on Audio, Speech and Language Processing
UR  - http://dx.doi.org/10.1109/TASLP.2021.3067161
UR  - https://ieeexplore.ieee.org/document/9381673
VL  - 29
ER  -