Stream-based Active Learning for Speech Emotion Recognition via Hybrid Data Selection and Continuous Learning
Autores: Juan Manuel Martín Doñas
Fecha: 09.09.2024
Abstract
This work proposes a novel stream-based Active Learning (AL) approach applied to Speech Emotion Recognition (SER) in real-life scenarios where new data are generated from different domains. The goal is to address major challenges in this field, including the lack of large-labeled data, the difficulty in the annotation, and the retrieval of representative emotional data. AL aims to address these problems by selecting/querying a small and valuable subset to be annotated with optimized labeling efforts and minimum resources. To this end, we consider a stream-based AL methodology leveraging MLOps principles and human-in-the-loop methods to continuously adapt previously trained deep learning models, ensuring both challenging and diverse audio samples and reducing the performance gap related to data diversity, cross-domain contexts, and continuous data ingestion. The considered pipeline was tested across several domains within three distinct scenarios, including both no stream- and stream-based approaches, as well as a pocket stream alternative to only update the previously trained models when significant improvements are obtained. The experimental outputs show that our proposed method achieved competitive results following an AL pocket stream-based strategy with just 20% of the original training data. This ensures good performance with a low allocated budget and continuous adaptation for practical, real-world environments.
BIB_text
title = {Stream-based Active Learning for Speech Emotion Recognition via Hybrid Data Selection and Continuous Learning},
pages = {105-117},
keywds = {
Active Learning; Clustering; Cross-domain learning; Speech Emotion Recognition; Stream-based
}
abstract = {
This work proposes a novel stream-based Active Learning (AL) approach applied to Speech Emotion Recognition (SER) in real-life scenarios where new data are generated from different domains. The goal is to address major challenges in this field, including the lack of large-labeled data, the difficulty in the annotation, and the retrieval of representative emotional data. AL aims to address these problems by selecting/querying a small and valuable subset to be annotated with optimized labeling efforts and minimum resources. To this end, we consider a stream-based AL methodology leveraging MLOps principles and human-in-the-loop methods to continuously adapt previously trained deep learning models, ensuring both challenging and diverse audio samples and reducing the performance gap related to data diversity, cross-domain contexts, and continuous data ingestion. The considered pipeline was tested across several domains within three distinct scenarios, including both no stream- and stream-based approaches, as well as a pocket stream alternative to only update the previously trained models when significant improvements are obtained. The experimental outputs show that our proposed method achieved competitive results following an AL pocket stream-based strategy with just 20% of the original training data. This ensures good performance with a low allocated budget and continuous adaptation for practical, real-world environments.
}
isbn = {978-303170565-6},
doi = {10.1007/978-3-031-70566-3_10},
date = {2024-09-09},
}