|
|
Bimonthly Since 1986 |
ISSN 1004-9037
|
|
|
|
|
Publication Details |
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
|
|
|
|
|
|
|
|
|
|
|
05 July-September 2023, Volume 38 Issue 4
|
|
|
Abstract
Text classification is a fundamental task in natural language processing, with applications ranging from sentiment analysis to content categorization.This research explores the use of Robustly Optimized BERT Pretraining Approach (RoBERTa), a powerful pretrained transformer model, for text document classification, specifically on the well-known 20 Newsgroups dataset. RoBERTa, a variant of the BERT model, is leveraged for its strong language understanding capabilities and adaptability to specific text classification tasks. The research explores the fine-tuning process and evaluates the model's performance in comparison to existing text classification algorithms. The study presents a comprehensive methodology that includes data preprocessing, model training, and evaluation. The results showcase the superiority of the RoBERTa-based model, with higher accuracy, precision, recall, and F1-score compared to traditional algorithms. The advantages of RoBERTa, such as its language understanding, adaptability, interpretable features, and generalization capabilities, are discussed. The research contributes to the field of natural language processing by demonstrating the potential of state-of-the-art language models for text classification. It highlights the practicality of using RoBERTa for real-world applications where accurate and robust document categorization is crucial.
Keyword
#
PDF Download (click here)
|
|
|
|
|