An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies
Authors: Edurado Lleida Luis Javier Rodríguez Javier Tejedor Alfonso Ortega Antonio Miguel Virginia Bazán Carmen Pérez Alberto de Prada Mikel Peñagarikano Amparo Varona Germán Bordel Doroteo Torre
Date: 01.08.2023
Applied Sciences (Switzerland)
Abstract
Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organized as part of the IberSpeech 2022 conference under the ongoing series of Albayzin evaluation campaigns. In the 2022 edition, four challenges were launched: (1) speech-to-text transcription; (2) speaker diarization and identity assignment; (3) text and speech alignment; and (4) search on speech. Different databases that cover different domains (e.g., broadcast news, conference talks, parliament sessions) were released for those challenges. The submitted systems also cover a wide range of speech processing methods, which include hidden Markov model-based approaches, end-to-end neural network-based methods, hybrid approaches, etc. This paper describes the databases, the tasks and the performance metrics used in the four challenges. It also provides the most relevant features of the submitted systems and briefly presents and discusses the obtained results. Despite employing state-of-the-art technology, the relatively poor performance attained in some of the challenges reveals that there is still room for improvement. This encourages us to carry on with the Albayzin evaluation campaigns in the coming years.
BIB_text
title = {An Overview of the IberSpeech-RTVE 2022 Challenges on Speech Technologies},
journal = {Applied Sciences (Switzerland)},
pages = {8577},
volume = {13},
keywds = {
Albayzin evaluations; IberSpeech Challenge; RTVE2022 database; search on speech; speaker diarization and identity assignment; speech-to-text transcription; text and speech alignment
}
abstract = {
Evaluation campaigns provide a common framework with which the progress of speech technologies can be effectively measured. The aim of this paper is to present a detailed overview of the IberSpeech-RTVE 2022 Challenges, which were organized as part of the IberSpeech 2022 conference under the ongoing series of Albayzin evaluation campaigns. In the 2022 edition, four challenges were launched: (1) speech-to-text transcription; (2) speaker diarization and identity assignment; (3) text and speech alignment; and (4) search on speech. Different databases that cover different domains (e.g., broadcast news, conference talks, parliament sessions) were released for those challenges. The submitted systems also cover a wide range of speech processing methods, which include hidden Markov model-based approaches, end-to-end neural network-based methods, hybrid approaches, etc. This paper describes the databases, the tasks and the performance metrics used in the four challenges. It also provides the most relevant features of the submitted systems and briefly presents and discusses the obtained results. Despite employing state-of-the-art technology, the relatively poor performance attained in some of the challenges reveals that there is still room for improvement. This encourages us to carry on with the Albayzin evaluation campaigns in the coming years.
}
doi = {10.3390/app13158577},
date = {2023-08-01},
}