|
05 July-September 2023, Volume 38 Issue 4
|
|
|
Abstract
Generating synthetic data has become an essential approach to address challenges related to limited data availability, unbalanced classes, data privacy, and generalization in machine learning applications. This paper discusses the importance of datasets in projects and research, considering aspects such as data collection, quality, size, labelling, splitting, benchmark datasets, sharing, ethics, and privacy. It explores the reasons for using synthetic data, including limited data availability, unbalanced classes, data privacy, diversity, augmentation, and cost efficiency. Various methods for generating synthetic data are data augmentation, variational autoencoders (VAEs), generative adversarial networks (GANs), synthetic data injection, and rule-based models. The focus then shifts to privacy-preserving data augmentation and the use of Privacy-Preserving GANs (PPGANs) to generate synthetic images while protecting sensitive information. Differential privacy techniques are incorporated into the data augmentation process to ensure privacy preservation. Evaluation metrics for synthetic data quality and GAN performance are also presented. The proposed model highlights the potential of PPGANs in generating privacy-preserving synthetic data that retains the statistical properties and visual characteristics of the original dataset.
Keyword
#
PDF Download (click here)
|