Lexical normalization of Spanish tweets with preprocessing rules, domain-specific edit-distances, and language models
Abstract
We present a system to normalize Spanish tweets, which uses preprocessing rules, a domain-appropriate edit-distance model, and language models to select correction candidates based on context. The system’s results at SEPLN 2013 Tweet-Norm task were above-average.
BIB_text
author = {Pablo Ruiz, Montse Cuadros, Thierry Etchegoyhen},
title = {Lexical normalization of Spanish tweets with preprocessing rules, domain-specific edit-distances, and language models},
number = {9},
keywds = {
microtexto, español, castellano, normalización léxica, Twitter, distancia de edición, modelo de lengua, Spanish microtext, lexical normalization, Twitter, edit distance, language model
}
abstract = {
We present a system to normalize Spanish tweets, which uses preprocessing rules, a domain-appropriate edit-distance model, and language models to select correction candidates based on context. The system’s results at SEPLN 2013 Tweet-Norm task were above-average.
}
isbn = {978-84-695-8349-4},
date = {2013-09-12},
year = {2013},
}