Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain
Authors:
Date: 15.11.2024
Abstract
We describe Vicomtech’s participation in the WMT 2024 Shared Task on translation into low-resource languages of Spain. We addressed all three languages of the task, namely Aragonese, Aranese and Asturian, in both constrained and open settings. Our work mainly centred on exploiting different types of corpora via data filtering, selection and combination methods, along with synthetic data generated with translation models based on rules, neural sequence-to-sequence or large language models. We improved or matched the best baselines in all three language pairs and present complementary results on additional test sets.
BIB_text
title = {Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain},
pages = {934-942},
keywds = {
Computer aided language translation; Data assimilation; Machine translation; Modeling languages
}
abstract = {
We describe Vicomtech’s participation in the WMT 2024 Shared Task on translation into low-resource languages of Spain. We addressed all three languages of the task, namely Aragonese, Aranese and Asturian, in both constrained and open settings. Our work mainly centred on exploiting different types of corpora via data filtering, selection and combination methods, along with synthetic data generated with translation models based on rules, neural sequence-to-sequence or large language models. We improved or matched the best baselines in all three language pairs and present complementary results on additional test sets.
}
isbn = {979-889176179-7},
date = {2024-11-15},
}