Targeted Data Augmentation Improves Context-aware Neural Machine Translation

Date: 04.09.2023


Abstract

Progress in document-level Machine Translation is hindered by the lack of parallel training data that include context information. In this work, we evaluate the potential of data augmentation techniques to circumvent these limitations, showing that significant gains can be achieved via upsampling, similar context sampling and back-translations, targeted on context-relevant data. We apply these methods on standard document-level datasets in English-German and English-French and demonstrate their relevance to improve the translation of contextual phenomena. In particular, we show that relatively small volumes of targeted data augmentation lead to significant improvements over a strong context-concatenation baseline and standard back-translation of document-level data. We also compare the accuracy of the selected methods depending on data volumes or distance to relevant context information, and explore their use in combination.

BIB_text

@Article {
title = {Targeted Data Augmentation Improves Context-aware Neural Machine Translation},
pages = {298-312},
keywds = {
Computational linguistics; Computer aided language translation
}
abstract = {

Progress in document-level Machine Translation is hindered by the lack of parallel training data that include context information. In this work, we evaluate the potential of data augmentation techniques to circumvent these limitations, showing that significant gains can be achieved via upsampling, similar context sampling and back-translations, targeted on context-relevant data. We apply these methods on standard document-level datasets in English-German and English-French and demonstrate their relevance to improve the translation of contextual phenomena. In particular, we show that relatively small volumes of targeted data augmentation lead to significant improvements over a strong context-concatenation baseline and standard back-translation of document-level data. We also compare the accuracy of the selected methods depending on data volumes or distance to relevant context information, and explore their use in combination.


}
date = {2023-09-04},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbao (Spain)

close overlay

Behavioral advertising cookies are necessary to load this content

Accept behavioral advertising cookies