Vicomtech at alexs 2020: Unsupervised complex word identification based on domain frequency
Egileak: Naiara Perez Miguel
Data: 23.09.2020
Abstract
This paper introduces Vicomtech's systems for unsupervised complex word identification submitted to the ALexS "Análisis Léxico en la SEPLN 2020"task. The systems are based on clustering algorithms with domain specific features, such as word frequency and probability in several Wikipedia corpora, word length, and number of synsets in WordNet. Our systems are designed to identify complex words, taking into account occurrence of the word in domain-specific texts in order to be able to adapt to the domain. Our systems reported good results, performing in second position.
BIB_text
title = {Vicomtech at alexs 2020: Unsupervised complex word identification based on domain frequency},
pages = {7-14},
keywds = {
Complex word identification, Lexical simplification, Unsupervised learning
}
abstract = {
This paper introduces Vicomtech's systems for unsupervised complex word identification submitted to the ALexS "Análisis Léxico en la SEPLN 2020"task. The systems are based on clustering algorithms with domain specific features, such as word frequency and probability in several Wikipedia corpora, word length, and number of synsets in WordNet. Our systems are designed to identify complex words, taking into account occurrence of the word in domain-specific texts in order to be able to adapt to the domain. Our systems reported good results, performing in second position.
}
date = {2020-09-23},
}