Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation

Authors: Abner Hernández Paula Andrea Pérez Tomás Arias Juan Camilo Vasquez Correa Seung Hee Yang Juan Rafael Orozco Andreas Maier

Date: 09.09.2024


Abstract

Acquiring speech data is a crucial step in the development of speech recognition systems and related speech-based machine learning models. However, protecting privacy is an increasing concern that must be addressed. This study investigates voice conversion (VC) as a strategy for anonymizing the speech of individuals with dysarthria. We specifically focus on training a variety of VC models using self-supervised speech representations, such as Wav2Vec and its multi-lingual variant, Wav2Vec2.0 (XLSR). The converted voices maintain a word error rate that is within 1% with respect to the original recordings. The Equal Error Rate (EER) showed a significant increase, from 1.52% to 41.18% on the LibriSpeech test set, and from 3.75% to 42.19% on speakers from the VCTK corpus, indicating a substantial decrease in speaker verification performance. A similar trend is observed with dysarthric speech, where the EER varied from 16.45% to 43.46%. Additionally, our study includes classification experiments on dysarthric vs. healthy speech data to demonstrate that anonymized voices can still yield speech features essential for distinguishing between healthy and pathological speech. The impact of voice conversion is investigated by covering aspects such as articulation, prosody, phonation, and phonology.

BIB_text

@Article {
title = {Anonymizing Dysarthric Speech: Investigating the Effects of Voice Conversion on Pathological Information Preservation},
pages = {149-160},
keywds = {
Dysarthria; Medical Data; Speech Representation; Voice Anonymization; Voice Conversion
}
abstract = {

Acquiring speech data is a crucial step in the development of speech recognition systems and related speech-based machine learning models. However, protecting privacy is an increasing concern that must be addressed. This study investigates voice conversion (VC) as a strategy for anonymizing the speech of individuals with dysarthria. We specifically focus on training a variety of VC models using self-supervised speech representations, such as Wav2Vec and its multi-lingual variant, Wav2Vec2.0 (XLSR). The converted voices maintain a word error rate that is within 1% with respect to the original recordings. The Equal Error Rate (EER) showed a significant increase, from 1.52% to 41.18% on the LibriSpeech test set, and from 3.75% to 42.19% on speakers from the VCTK corpus, indicating a substantial decrease in speaker verification performance. A similar trend is observed with dysarthric speech, where the EER varied from 16.45% to 43.46%. Additionally, our study includes classification experiments on dysarthric vs. healthy speech data to demonstrate that anonymized voices can still yield speech features essential for distinguishing between healthy and pathological speech. The impact of voice conversion is investigated by covering aspects such as articulation, prosody, phonation, and phonology.


}
isbn = {978-303170565-6},
doi = {10.1007/978-3-031-70566-3_14},
date = {2024-09-09},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbao (Spain)

close overlay

Behavioral advertising cookies are necessary to load this content

Accept behavioral advertising cookies