A Portable Method for Parallel and Comparable Document Alignment
Authors: Andoni Azpeitia Zaldua
Date: 02.06.2016
Abstract
We present a document alignment method based on expanded lexical translation sets and document-level Jaccard similarity. We compare our approach to state-of-the-art methods on a variety of alignment tasks, showing that it outperforms alternative methods in most scenarios for both parallel and comparable corpora. The proposed method is highly portable, requiring only minimal seed information and no task-specific training, thus providing the means for an efficient exploitation of multilingual documents.
BIB_text
title = {A Portable Method for Parallel and Comparable Document Alignment},
pages = {243-255},
number = {2},
volume = {4},
keywds = {
Document alignment, Comparable corpora, Parallel corpora
}
abstract = {
We present a document alignment method based on expanded lexical translation sets and document-level Jaccard similarity. We compare our approach to state-of-the-art methods on a variety of alignment tasks, showing that it outperforms alternative methods in most scenarios for both parallel and comparable corpora. The proposed method is highly portable, requiring only minimal seed information and no task-specific training, thus providing the means for an efficient exploitation of multilingual documents.
}
date = {2016-06-02},
year = {2016},
}