An Online Diarization Approach for Streaming Applications Based on Tree-Clustering and Bayesian Resegmentation
Authors: Juan Manuel Martín Doñas Joaquin Arellano Goicoechea
Date: 04.09.2023
Abstract
This paper describes our proposed system for online speaker diarization suitable for streaming applications. Assuming the availability of an audio segment before the partial result is required, our method exploits this information by combining online clustering and resegmentation. First, the speaker embeddings extracted from an x-vector neural network are labeled using tree-based clustering. Then, when a complete batch of x-vectors is available, a Bayesian resegmentation is applied to refine the clusters further. Moreover, we exploit the fact that both methods share the same statistical framework, adapting the resegmentation step to use the history of the decision tree to avoid permutation label issues. Our approach is evaluated with broadcast TV content from the Albayzin Diarization Challenges. The results show that our system is able to outperform online tree-based clustering and obtain comparable performance with state-of-the-art offline approaches while allowing low-latency requirements for practical streaming services.
BIB_text
title = {An Online Diarization Approach for Streaming Applications Based on Tree-Clustering and Bayesian Resegmentation},
pages = {258-269},
keywds = {
Batch-online processing; Speaker Diarization; Tree-based clustering; Variational Bayes resegmentation; X-vector extractor
}
abstract = {
This paper describes our proposed system for online speaker diarization suitable for streaming applications. Assuming the availability of an audio segment before the partial result is required, our method exploits this information by combining online clustering and resegmentation. First, the speaker embeddings extracted from an x-vector neural network are labeled using tree-based clustering. Then, when a complete batch of x-vectors is available, a Bayesian resegmentation is applied to refine the clusters further. Moreover, we exploit the fact that both methods share the same statistical framework, adapting the resegmentation step to use the history of the decision tree to avoid permutation label issues. Our approach is evaluated with broadcast TV content from the Albayzin Diarization Challenges. The results show that our system is able to outperform online tree-based clustering and obtain comparable performance with state-of-the-art offline approaches while allowing low-latency requirements for practical streaming services.
}
isbn = {978-303140497-9},
date = {2023-09-04},
}