Bimonthly    Since 1986
ISSN 1004-9037
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
   
      05 May 2023, Volume 38 Issue 3
    Article

    MEASURING INFLUENCE OF NOISE IN SUPERVISED LEARNING PERFORMANCE UNDER PRIVACY PRESERVING ENVIRONMENT
    Mayur Rathi1, Anand Rajavat2
    Journal of Data Acquisition and Processing, 2023, 38 (3): 6967-6978 . 

    Abstract

    Data mining combined with security and privacy is known as Privacy-Preserving Data Mining (PPDM). In this setting, multiple data owners are aggregating their data with unknown parties for utilizing the combined knowledge based on data intelligence techniques. The PPDM outcomes are sensitive against different factors like dimensions and sanitization techniques. In this paper, we aimed to investigate the performance influence of PPDM classifiers due to data dimensions and sanitization techniques. In this context, first, a public dataset KDD CUP is obtained. Additionally, the dimensionality reduction techniques such as PCA (Principle component analysis), KPCA (kernel principle analysis), and CRC (correlation coefficient) are applied for the observing the impact of dimensions in PPDM systems. Additionally, noise based data sanitization technique is investigated for investigating the impact of noise on PPDM systems. Further random noise, is used to sanitize the data. But, the categorical data can not be utilized with random noise. Therefore, we extended random noise algorithm as controlled noise algorithm. The controlled random noise algorithm is producing a new sanitized dataset without disturbing the data utility. The newly generated datasets are trained with two supervised learning algorithms, i.e. C4.5 and CART. The experiments on five public UCI datasets are performed. The results prove that the accuracy of classifier is highly influencing with classical random noise. Beside that, the proposed controlled noise-based technique has low impact on classifier’s accuracy, because less statistical difference between original and controlled noise-based sanitized data. In addition, the controlled noise may expensive in terms of time and memory utilization due to moditification of data.

    Keyword

    PPDM, Effect of Data Dimension, Effect of Data Sanitization, Random Noise, Noise Inclusion Algorithm.


    PDF Download (click here)

SCImago Journal & Country Rank

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved
.