An Online Diarization Approach for Streaming Applications Based on Tree-Clustering and Bayesian Resegmentation

Egileak: Juan Manuel Martín Doñas Haritz Arzelus Irazusta Aitor Álvarez Muniain Joaquin Arellano Goicoechea

Data: 04.09.2023


Abstract

This paper describes our proposed system for online speaker diarization suitable for streaming applications. Assuming the availability of an audio segment before the partial result is required, our method exploits this information by combining online clustering and resegmentation. First, the speaker embeddings extracted from an x-vector neural network are labeled using tree-based clustering. Then, when a complete batch of x-vectors is available, a Bayesian resegmentation is applied to refine the clusters further. Moreover, we exploit the fact that both methods share the same statistical framework, adapting the resegmentation step to use the history of the decision tree to avoid permutation label issues. Our approach is evaluated with broadcast TV content from the Albayzin Diarization Challenges. The results show that our system is able to outperform online tree-based clustering and obtain comparable performance with state-of-the-art offline approaches while allowing low-latency requirements for practical streaming services.

BIB_text

@Article {
title = {An Online Diarization Approach for Streaming Applications Based on Tree-Clustering and Bayesian Resegmentation},
pages = {258-269},
keywds = {
Batch-online processing; Speaker Diarization; Tree-based clustering; Variational Bayes resegmentation; X-vector extractor
}
abstract = {

This paper describes our proposed system for online speaker diarization suitable for streaming applications. Assuming the availability of an audio segment before the partial result is required, our method exploits this information by combining online clustering and resegmentation. First, the speaker embeddings extracted from an x-vector neural network are labeled using tree-based clustering. Then, when a complete batch of x-vectors is available, a Bayesian resegmentation is applied to refine the clusters further. Moreover, we exploit the fact that both methods share the same statistical framework, adapting the resegmentation step to use the history of the decision tree to avoid permutation label issues. Our approach is evaluated with broadcast TV content from the Albayzin Diarization Challenges. The results show that our system is able to outperform online tree-based clustering and obtain comparable performance with state-of-the-art offline approaches while allowing low-latency requirements for practical streaming services.


}
isbn = {978-303140497-9},
date = {2023-09-04},
}
Vicomtech

Gipuzkoako Zientzia eta Teknologia Parkea,
Mikeletegi Pasealekua 57,
20009 Donostia / San Sebastián (Espainia)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbo (Espainia)

close overlay

Jokaeraren araberako publizitateko cookieak beharrezkoak dira eduki hau kargatzeko

Onartu jokaeraren araberako publizitateko cookieak