Incorporation of Synthetic Data Generation Techniques within a Controlled Data Processing Workflow in the Health and Wellbeing Domain
Autores: Roberto Álvarez Cristina Molina Xabat Larrea Ane Alberdi Michalis Timoleon Panagiotis Bamidis Evdokimos Konstantinidis
Fecha: 01.03.2023
Electronics
Abstract
To date, the use of synthetic data generation techniques in the health and wellbeing domain has been mainly limited to research activities. Although several open source and commercial packages have been released, they have been oriented to generating synthetic data as a standalone data preparation process and not integrated into a broader analysis or experiment testing workflow. In this context, the VITALISE project is working to harmonize Living Lab research and data capture protocols and to provide controlled processing access to captured data to industrial and scientific communities. In this paper, we present the initial design and implementation of our synthetic data generation approach in the context of VITALISE Living Lab controlled data processing workflow, together with identified challenges and future developments. By uploading data captured from Living Labs, generating synthetic data from them, developing analysis locally with synthetic data, and then executing them remotely with real data, the utility of the proposed workflow has been validated. Results have shown that the presented workflow helps accelerate research on artificial intelligence, ensuring compliance with data protection laws. The presented approach has demonstrated how the adoption of state-of-the-art synthetic data generation techniques can be applied for real-world applications.
BIB_text
title = {Incorporation of Synthetic Data Generation Techniques within a Controlled Data Processing Workflow in the Health and Wellbeing Domain},
journal = {Electronics},
pages = {812},
volume = {11},
keywds = {
synthetic data generation; Living Lab; controlled data processing; machine learning
}
abstract = {
To date, the use of synthetic data generation techniques in the health and wellbeing domain has been mainly limited to research activities. Although several open source and commercial packages have been released, they have been oriented to generating synthetic data as a standalone data preparation process and not integrated into a broader analysis or experiment testing workflow. In this context, the VITALISE project is working to harmonize Living Lab research and data capture protocols and to provide controlled processing access to captured data to industrial and scientific communities. In this paper, we present the initial design and implementation of our synthetic data generation approach in the context of VITALISE Living Lab controlled data processing workflow, together with identified challenges and future developments. By uploading data captured from Living Labs, generating synthetic data from them, developing analysis locally with synthetic data, and then executing them remotely with real data, the utility of the proposed workflow has been validated. Results have shown that the presented workflow helps accelerate research on artificial intelligence, ensuring compliance with data protection laws. The presented approach has demonstrated how the adoption of state-of-the-art synthetic data generation techniques can be applied for real-world applications.
}
doi = {10.3390/electronics11050812},
date = {2023-03-01},
}