EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque

Egileak: Aitor García Pablos Naiara Pérez Montserrat Cuadros Oller Jaione Bengoetxea

Data: 01.09.2024

Procesamiento del Lenguaje Natural


Abstract

The widespread availability of Question Answering (QA) datasets in English has greatly facilitated the advancement of the Natural Language Processing (NLP) field. However, the scarcity of such resources for minority languages, such as Basque, poses a substantial challenge for these communities. In this context, the translation and alignment of existing QA datasets plays a crucial role in narrowing this technological gap. This work presents EuSQuAD, the first initiative dedicated to automatically translating and aligning SQuAD2.0 into Basque, resulting in more than 142k QA examples. We demonstrate EuSQuAD’s value through extensive qualitative analysis and QA experiments supported with EuSQuAD as training data. These experiments are evaluated with a new human-annotated dataset.

BIB_text

@Article {
title = {EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque},
journal = {Procesamiento del Lenguaje Natural},
pages = {125-137},
volume = {73},
keywds = {
Basque; question answering; reading comprehension; SQuAD
}
abstract = {

The widespread availability of Question Answering (QA) datasets in English has greatly facilitated the advancement of the Natural Language Processing (NLP) field. However, the scarcity of such resources for minority languages, such as Basque, poses a substantial challenge for these communities. In this context, the translation and alignment of existing QA datasets plays a crucial role in narrowing this technological gap. This work presents EuSQuAD, the first initiative dedicated to automatically translating and aligning SQuAD2.0 into Basque, resulting in more than 142k QA examples. We demonstrate EuSQuAD’s value through extensive qualitative analysis and QA experiments supported with EuSQuAD as training data. These experiments are evaluated with a new human-annotated dataset.


}
doi = {10.26342/2024-73-9},
date = {2024-09-01},
}
Vicomtech

Gipuzkoako Zientzia eta Teknologia Parkea,
Mikeletegi Pasealekua 57,
20009 Donostia / San Sebastián (Espainia)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbo (Espainia)

close overlay

Jokaeraren araberako publizitateko cookieak beharrezkoak dira eduki hau kargatzeko

Onartu jokaeraren araberako publizitateko cookieak