Vicomtech at cantemist 2020

Authors: Aitor García Pablos Naiara Perez Miguel Montserrat Cuadros Oller

Date: 23.09.2020


Abstract

his paper describes the participation of the Vicomtech NLP team in the CANTEMIST shared task, consisting in the automatic assignment of ICD-O-3 tumour morphology codes to health-related documents in Spanish language. The submitted systems are based on pre-trained BERT models. The contextual embeddings obtained for each token are used in a multitask sequence-labelling approach that takes advantage of ICD-O-3 code's structure. We have experimented with different pre-trained BERT models and combinations, as well as several ensemble structures. The three task tracks-tumour morphology mention recognition, normalisation and document coding-have been approached at the same time, based on the outputs of the proposed models and some post-processing steps. The reported results are robust and perform well across different subsets of data. The official results also indicate that the ensemble models outperform individual models.

BIB_text

@Article {
title = {Vicomtech at cantemist 2020},
pages = {489-498},
keywds = {
Clinical text coding, Icd-o-3, Oncology
}
abstract = {

his paper describes the participation of the Vicomtech NLP team in the CANTEMIST shared task, consisting in the automatic assignment of ICD-O-3 tumour morphology codes to health-related documents in Spanish language. The submitted systems are based on pre-trained BERT models. The contextual embeddings obtained for each token are used in a multitask sequence-labelling approach that takes advantage of ICD-O-3 code's structure. We have experimented with different pre-trained BERT models and combinations, as well as several ensemble structures. The three task tracks-tumour morphology mention recognition, normalisation and document coding-have been approached at the same time, based on the outputs of the proposed models and some post-processing steps. The reported results are robust and perform well across different subsets of data. The official results also indicate that the ensemble models outperform individual models.


}
date = {2020-09-23},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbao (Spain)

close overlay

Behavioral advertising cookies are necessary to load this content

Accept behavioral advertising cookies