Journal of Data Acquisition and Processing

08 December 2018, Volume 34 Issue 6

Article

1.	IDENTIFICATION AND PROCESSING OF PII DATA, APPLYING DEEP LEARNING MODELS WITH IMPROVED ACCURACY AND EFFICIENCY Mainak Mitra, Soumit Roy Journal of Data Acquisition and Processing, 2018, 34 (6): 1452-1461.

Abstract

Data governance involves an important aspect of data privacy for the enterprises to compliant with the data privacy standards like GDPR as the data usually involves sensitive personal identifiable information (PII). Across the cross-region collection and distribution of data, the de-identification and anonymization of PII data is mandatory for the security and privacy. In this paper, the potential machine learning and deep learning models are explored for the development of natural language processing (NLP) based large language model (LLM) for the automatic detection of PII data and its masking for implementing data privacy. Support Vector Machines (SVM), Random Forest (RF), Logistic Regressions (LR), Long Short-Term Memory (LSTM), and Multi-Layer Perceptron (MLP) models are trained on features extracted using Term Frequency-Inverse Document Frequency (TF-IDF) approach, for evaluating the performance in text classification of PII data. The implementation of a detection and masking of PII in presentation layer of data is proposed for improved data-anonymization.

Keyword

PDF Download (click here)