Vicomtech at alexs 2020: Unsupervised complex word identification based on domain frequency

Data: 23.09.2020


Abstract

This paper introduces Vicomtech's systems for unsupervised complex word identification submitted to the ALexS "Análisis Léxico en la SEPLN 2020"task. The systems are based on clustering algorithms with domain specific features, such as word frequency and probability in several Wikipedia corpora, word length, and number of synsets in WordNet. Our systems are designed to identify complex words, taking into account occurrence of the word in domain-specific texts in order to be able to adapt to the domain. Our systems reported good results, performing in second position.

BIB_text

@Article {
title = {Vicomtech at alexs 2020: Unsupervised complex word identification based on domain frequency},
pages = {7-14},
keywds = {
Complex word identification, Lexical simplification, Unsupervised learning
}
abstract = {

This paper introduces Vicomtech's systems for unsupervised complex word identification submitted to the ALexS "Análisis Léxico en la SEPLN 2020"task. The systems are based on clustering algorithms with domain specific features, such as word frequency and probability in several Wikipedia corpora, word length, and number of synsets in WordNet. Our systems are designed to identify complex words, taking into account occurrence of the word in domain-specific texts in order to be able to adapt to the domain. Our systems reported good results, performing in second position.


}
date = {2020-09-23},
}
Vicomtech

Gipuzkoako Zientzia eta Teknologia Parkea,
Mikeletegi Pasealekua 57,
20009 Donostia / San Sebastián (Espainia)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbo (Espainia)

close overlay

Jokaeraren araberako publizitateko cookieak beharrezkoak dira eduki hau kargatzeko

Onartu jokaeraren araberako publizitateko cookieak