Journal of Data Acquisition and Processing

05 May 2023, Volume 38 Issue 3

Article

MEASURING INFLUENCE OF NOISE IN SUPERVISED LEARNING PERFORMANCE UNDER PRIVACY PRESERVING ENVIRONMENT

Mayur Rathi1, Anand Rajavat2

Journal of Data Acquisition and Processing, 2023, 38 (3): 6967-6978 .

Abstract

Data mining combined with security and privacy is known as Privacy-Preserving Data Mining (PPDM). In this setting, multiple data owners are aggregating their data with unknown parties for utilizing the combined knowledge based on data intelligence techniques. The PPDM outcomes are sensitive against different factors like dimensions and sanitization techniques. In this paper, we aimed to investigate the performance influence of PPDM classifiers due to data dimensions and sanitization techniques. In this context, first, a public dataset KDD CUP is obtained. Additionally, the dimensionality reduction techniques such as PCA (Principle component analysis), KPCA (kernel principle analysis), and CRC (correlation coefficient) are applied for the observing the impact of dimensions in PPDM systems. Additionally, noise based data sanitization technique is investigated for investigating the impact of noise on PPDM systems. Further random noise, is used to sanitize the data. But, the categorical data can not be utilized with random noise. Therefore, we extended random noise algorithm as controlled noise algorithm. The controlled random noise algorithm is producing a new sanitized dataset without disturbing the data utility. The newly generated datasets are trained with two supervised learning algorithms, i.e. C4.5 and CART. The experiments on five public UCI datasets are performed. The results prove that the accuracy of classifier is highly influencing with classical random noise. Beside that, the proposed controlled noise-based technique has low impact on classifier’s accuracy, because less statistical difference between original and controlled noise-based sanitized data. In addition, the controlled noise may expensive in terms of time and memory utilization due to moditification of data.

Keyword

PPDM, Effect of Data Dimension, Effect of Data Sanitization, Random Noise, Noise Inclusion Algorithm.

PDF Download (click here)