Synthetic Annotated Data for Named Entity Recognition in Computed Tomography Scan Reports
Egileak:
Data: 24.09.2024
Abstract
It is widely acknowledged that clinical data, in general, is scarce, and this scarcity worsens when focusing on specific domains. Moreover, the challenge escalates when annotated data is required. In this paper, we propose an approach to create synthetic annotated datasets for Named Entity Recognition (NER) tasks in Computed Tomography Reports (CTR) by leveraging large language models (LLMs). We investigate the potential of LLMs to generate meaningful texts in the healthcare domain through a combination of text generation techniques and automatic annotation using LLMs. Additionally, we conducted a series of experiments to demonstrate the efficacy of using synthetic data compared to real data for solving NER tasks.
BIB_text
title = {Synthetic Annotated Data for Named Entity Recognition in Computed Tomography Scan Reports},
pages = {69-78},
keywds = {
Biomedical NER; data synthesis; text generation
}
abstract = {
It is widely acknowledged that clinical data, in general, is scarce, and this scarcity worsens when focusing on specific domains. Moreover, the challenge escalates when annotated data is required. In this paper, we propose an approach to create synthetic annotated datasets for Named Entity Recognition (NER) tasks in Computed Tomography Reports (CTR) by leveraging large language models (LLMs). We investigate the potential of LLMs to generate meaningful texts in the healthcare domain through a combination of text generation techniques and automatic annotation using LLMs. Additionally, we conducted a series of experiments to demonstrate the efficacy of using synthetic data compared to real data for solving NER tasks.
}
date = {2024-09-24},
}