Synthetic data generation for tabular health records: A systematic review

Authors: Gorka Epelde Unanue

Date: 17.04.2022

Neurocomputing


Abstract

Synthetic data generation (SDG) research has been ongoing for some time with promising results in different application domains, including healthcare, biometrics and energy consumption. The need for a robust SDG solution to capitalise on advances in Big Data and AI technology has never been greater to enable access to useful data while ensuring reasonable privacy protections. This paper presents a systematic review from the last 5 years (2016–2021) to analyse and report on recent approaches in synthetic tabular data generation (STDG) with a focus on the healthcare application context to preserve patient privacy, paying special attention to the contribution of Generative Adversarial Networks (GAN). In total 34 publications have been retrieved and analysed. A classification of approaches has been proposed and the performance of GAN-based approaches has been extensively analysed. From the systematic review it has been concluded that there is no universal method or metric to evaluate and benchmark the performance of various approaches and that further research is needed to improve the generalisability of GANs to find a model that works optimally across tabular healthcare data.

BIB_text

@Article {
author = {Gorka Epelde Unanue},
title = {Synthetic data generation for tabular health records: A systematic review},
journal = {Neurocomputing},
volume = {493},
keywds = {
Synthetic data generation; Generative adversarial networks; Privacy preserving data; Data sharing; Healthcare; Artificial intelligence
}
abstract = {

Synthetic data generation (SDG) research has been ongoing for some time with promising results in different application domains, including healthcare, biometrics and energy consumption. The need for a robust SDG solution to capitalise on advances in Big Data and AI technology has never been greater to enable access to useful data while ensuring reasonable privacy protections. This paper presents a systematic review from the last 5 years (2016–2021) to analyse and report on recent approaches in synthetic tabular data generation (STDG) with a focus on the healthcare application context to preserve patient privacy, paying special attention to the contribution of Generative Adversarial Networks (GAN). In total 34 publications have been retrieved and analysed. A classification of approaches has been proposed and the performance of GAN-based approaches has been extensively analysed. From the systematic review it has been concluded that there is no universal method or metric to evaluate and benchmark the performance of various approaches and that further research is needed to improve the generalisability of GANs to find a model that works optimally across tabular healthcare data.


}
doi = {10.1016/j.neucom.2022.04.053},
date = {2022-04-17},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbao (Spain)

close overlay

Behavioral advertising cookies are necessary to load this content

Accept behavioral advertising cookies