Vicomtech at cantemist 2020
Authors: Naiara Perez Miguel
Date: 23.09.2020
Abstract
his paper describes the participation of the Vicomtech NLP team in the CANTEMIST shared task, consisting in the automatic assignment of ICD-O-3 tumour morphology codes to health-related documents in Spanish language. The submitted systems are based on pre-trained BERT models. The contextual embeddings obtained for each token are used in a multitask sequence-labelling approach that takes advantage of ICD-O-3 code's structure. We have experimented with different pre-trained BERT models and combinations, as well as several ensemble structures. The three task tracks-tumour morphology mention recognition, normalisation and document coding-have been approached at the same time, based on the outputs of the proposed models and some post-processing steps. The reported results are robust and perform well across different subsets of data. The official results also indicate that the ensemble models outperform individual models.
BIB_text
title = {Vicomtech at cantemist 2020},
pages = {489-498},
keywds = {
Clinical text coding, Icd-o-3, Oncology
}
abstract = {
his paper describes the participation of the Vicomtech NLP team in the CANTEMIST shared task, consisting in the automatic assignment of ICD-O-3 tumour morphology codes to health-related documents in Spanish language. The submitted systems are based on pre-trained BERT models. The contextual embeddings obtained for each token are used in a multitask sequence-labelling approach that takes advantage of ICD-O-3 code's structure. We have experimented with different pre-trained BERT models and combinations, as well as several ensemble structures. The three task tracks-tumour morphology mention recognition, normalisation and document coding-have been approached at the same time, based on the outputs of the proposed models and some post-processing steps. The reported results are robust and perform well across different subsets of data. The official results also indicate that the ensemble models outperform individual models.
}
date = {2020-09-23},
}