Effect of incorporating metadata to the generation of synthetic time series in a healthcare context

Fecha: 01.06.2023


Abstract

Synthetic data is becoming the way forward to manage legal and regulatory aspects of biomedical research involving personal and clinical data. As no matches are expected between artificial instances and real samples and/or subjects, external researchers performing secondary analyses could benefit significantly by having unlimited access to uncompromised information. In this context, one of the main objectives of the H2020 VITALISE project is to develop a platform for providing synthetic data generated from real data collected in Living Labs to those external researchers. In addition, while some time series specific synthetic data generation models exist, only a few of them consider metadata (e.g., patient demographics) as part of the time series generation process itself. Therefore, the objective of this research is to perform a comparative assessment of two synthetic data generation models that use and process the metadata of subjects differently: The Wasserstein GAN with Gradient Penalty (WGAN-GP) and the DöppelGANger (DGAN). To achieve this goal making sure the analyses were data-independent, we selected two healthcare-related longitudinal datasets: (1) Treadmill Maximal Effort Test (TMET) measurements from the University of Málaga; and (2) a hypotension subset derived from the MIMIC-III v1.4 database. After synthetic data was generated, we assessed three pivotal aspects: resemblance to the original data, utility, and level of privacy. As a main conclusion, the importance of using metadata as context variables and the methodology to take them into account was proved to be significant and valuable, the DGAN model offering better results overall. A more extensive time series specific evaluation is left as the main avenue for future research.

BIB_text

@Article {
title = {Effect of incorporating metadata to the generation of synthetic time series in a healthcare context},
pages = {910-916},
keywds = {
health data; shareable data; synthetic data; time series
}
abstract = {

Synthetic data is becoming the way forward to manage legal and regulatory aspects of biomedical research involving personal and clinical data. As no matches are expected between artificial instances and real samples and/or subjects, external researchers performing secondary analyses could benefit significantly by having unlimited access to uncompromised information. In this context, one of the main objectives of the H2020 VITALISE project is to develop a platform for providing synthetic data generated from real data collected in Living Labs to those external researchers. In addition, while some time series specific synthetic data generation models exist, only a few of them consider metadata (e.g., patient demographics) as part of the time series generation process itself. Therefore, the objective of this research is to perform a comparative assessment of two synthetic data generation models that use and process the metadata of subjects differently: The Wasserstein GAN with Gradient Penalty (WGAN-GP) and the DöppelGANger (DGAN). To achieve this goal making sure the analyses were data-independent, we selected two healthcare-related longitudinal datasets: (1) Treadmill Maximal Effort Test (TMET) measurements from the University of Málaga; and (2) a hypotension subset derived from the MIMIC-III v1.4 database. After synthetic data was generated, we assessed three pivotal aspects: resemblance to the original data, utility, and level of privacy. As a main conclusion, the importance of using metadata as context variables and the methodology to take them into account was proved to be significant and valuable, the DGAN model offering better results overall. A more extensive time series specific evaluation is left as the main avenue for future research.


}
isbn = {979-835031224-9},
date = {2023-06-01},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (España)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbao (España)

close overlay

Las cookies de publicidad comportamental son necesarias para cargar el contenido

Aceptar cookies de publicidad comportamental