Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain

Abstract

We describe Vicomtech’s participation in the WMT 2024 Shared Task on translation into low-resource languages of Spain. We addressed all three languages of the task, namely Aragonese, Aranese and Asturian, in both constrained and open settings. Our work mainly centred on exploiting different types of corpora via data filtering, selection and combination methods, along with synthetic data generated with translation models based on rules, neural sequence-to-sequence or large language models. We improved or matched the best baselines in all three language pairs and present complementary results on additional test sets.

BIB_text

@Article {
title = {Vicomtech@WMT 2024: Shared Task on Translation into Low-Resource Languages of Spain},
pages = {934-942},
keywds = {
Computer aided language translation; Data assimilation; Machine translation; Modeling languages
}
abstract = {

We describe Vicomtech’s participation in the WMT 2024 Shared Task on translation into low-resource languages of Spain. We addressed all three languages of the task, namely Aragonese, Aranese and Asturian, in both constrained and open settings. Our work mainly centred on exploiting different types of corpora via data filtering, selection and combination methods, along with synthetic data generated with translation models based on rules, neural sequence-to-sequence or large language models. We improved or matched the best baselines in all three language pairs and present complementary results on additional test sets.


}
isbn = {979-889176179-7},
date = {2024-11-15},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbao (Spain)

close overlay

Behavioral advertising cookies are necessary to load this content

Accept behavioral advertising cookies