|
|
Bimonthly Since 1986 |
ISSN 1004-9037
|
|
|
|
|
Publication Details |
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
|
|
|
|
|
|
|
|
|
|
Abstract
Data governance involves an important aspect of data privacy for the enterprises to compliant with the data privacy standards like GDPR as the data usually involves sensitive personal identifiable information (PII). Across the cross-region collection and distribution of data, the de-identification and anonymization of PII data is mandatory for the security and privacy. In this paper, the potential machine learning and deep learning models are explored for the development of natural language processing (NLP) based large language model (LLM) for the automatic detection of PII data and its masking for implementing data privacy. Support Vector Machines (SVM), Random Forest (RF), Logistic Regressions (LR), Long Short-Term Memory (LSTM), and Multi-Layer Perceptron (MLP) models are trained on features extracted using Term Frequency-Inverse Document Frequency (TF-IDF) approach, for evaluating the performance in text classification of PII data. The implementation of a detection and masking of PII in presentation layer of data is proposed for improved data-anonymization.
Keyword
#
PDF Download (click here)
|
|
|
|
|