Stream-based Active Learning for Speech Emotion Recognition via Hybrid Data Selection and Continuous Learning

Date: 09.09.2024


Abstract

This work proposes a novel stream-based Active Learning (AL) approach applied to Speech Emotion Recognition (SER) in real-life scenarios where new data are generated from different domains. The goal is to address major challenges in this field, including the lack of large-labeled data, the difficulty in the annotation, and the retrieval of representative emotional data. AL aims to address these problems by selecting/querying a small and valuable subset to be annotated with optimized labeling efforts and minimum resources. To this end, we consider a stream-based AL methodology leveraging MLOps principles and human-in-the-loop methods to continuously adapt previously trained deep learning models, ensuring both challenging and diverse audio samples and reducing the performance gap related to data diversity, cross-domain contexts, and continuous data ingestion. The considered pipeline was tested across several domains within three distinct scenarios, including both no stream- and stream-based approaches, as well as a pocket stream alternative to only update the previously trained models when significant improvements are obtained. The experimental outputs show that our proposed method achieved competitive results following an AL pocket stream-based strategy with just 20% of the original training data. This ensures good performance with a low allocated budget and continuous adaptation for practical, real-world environments.

BIB_text

@Article {
title = {Stream-based Active Learning for Speech Emotion Recognition via Hybrid Data Selection and Continuous Learning},
pages = {105-117},
keywds = {
Active Learning; Clustering; Cross-domain learning; Speech Emotion Recognition; Stream-based
}
abstract = {

This work proposes a novel stream-based Active Learning (AL) approach applied to Speech Emotion Recognition (SER) in real-life scenarios where new data are generated from different domains. The goal is to address major challenges in this field, including the lack of large-labeled data, the difficulty in the annotation, and the retrieval of representative emotional data. AL aims to address these problems by selecting/querying a small and valuable subset to be annotated with optimized labeling efforts and minimum resources. To this end, we consider a stream-based AL methodology leveraging MLOps principles and human-in-the-loop methods to continuously adapt previously trained deep learning models, ensuring both challenging and diverse audio samples and reducing the performance gap related to data diversity, cross-domain contexts, and continuous data ingestion. The considered pipeline was tested across several domains within three distinct scenarios, including both no stream- and stream-based approaches, as well as a pocket stream alternative to only update the previously trained models when significant improvements are obtained. The experimental outputs show that our proposed method achieved competitive results following an AL pocket stream-based strategy with just 20% of the original training data. This ensures good performance with a low allocated budget and continuous adaptation for practical, real-world environments.


}
isbn = {978-303170565-6},
doi = {10.1007/978-3-031-70566-3_10},
date = {2024-09-09},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbao (Spain)

close overlay

Behavioral advertising cookies are necessary to load this content

Accept behavioral advertising cookies