Journal of Data Acquisition and Processing

05 July-September 2023, Volume 38 Issue 4

Article

ROBUSTLY OPTIMIZED BERT PRETRAINING APPROACH BASED TEXT DOCUMENT CLASSIFICATION

Raja R, Dr.G.Jagatheeshkumar

Journal of Data Acquisition and Processing, 2023, 38 (4): 2187-2204 .

Abstract

Text classification is a fundamental task in natural language processing, with applications ranging from sentiment analysis to content categorization.This research explores the use of Robustly Optimized BERT Pretraining Approach (RoBERTa), a powerful pretrained transformer model, for text document classification, specifically on the well-known 20 Newsgroups dataset. RoBERTa, a variant of the BERT model, is leveraged for its strong language understanding capabilities and adaptability to specific text classification tasks. The research explores the fine-tuning process and evaluates the model's performance in comparison to existing text classification algorithms. The study presents a comprehensive methodology that includes data preprocessing, model training, and evaluation. The results showcase the superiority of the RoBERTa-based model, with higher accuracy, precision, recall, and F1-score compared to traditional algorithms. The advantages of RoBERTa, such as its language understanding, adaptability, interpretable features, and generalization capabilities, are discussed. The research contributes to the field of natural language processing by demonstrating the potential of state-of-the-art language models for text classification. It highlights the practicality of using RoBERTa for real-world applications where accurate and robust document categorization is crucial.

Keyword

PDF Download (click here)