Bimonthly    Since 1986
ISSN 1004-9037
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
   
      05 July-September 2023, Volume 38 Issue 4
    Article

    ROBUSTLY OPTIMIZED BERT PRETRAINING APPROACH BASED TEXT DOCUMENT CLASSIFICATION
    Raja R, Dr.G.Jagatheeshkumar
    Journal of Data Acquisition and Processing, 2023, 38 (4): 2187-2204 . 

    Abstract

    Text classification is a fundamental task in natural language processing, with applications ranging from sentiment analysis to content categorization.This research explores the use of Robustly Optimized BERT Pretraining Approach (RoBERTa), a powerful pretrained transformer model, for text document classification, specifically on the well-known 20 Newsgroups dataset. RoBERTa, a variant of the BERT model, is leveraged for its strong language understanding capabilities and adaptability to specific text classification tasks. The research explores the fine-tuning process and evaluates the model's performance in comparison to existing text classification algorithms. The study presents a comprehensive methodology that includes data preprocessing, model training, and evaluation. The results showcase the superiority of the RoBERTa-based model, with higher accuracy, precision, recall, and F1-score compared to traditional algorithms. The advantages of RoBERTa, such as its language understanding, adaptability, interpretable features, and generalization capabilities, are discussed. The research contributes to the field of natural language processing by demonstrating the potential of state-of-the-art language models for text classification. It highlights the practicality of using RoBERTa for real-world applications where accurate and robust document categorization is crucial.

    Keyword

    #


    PDF Download (click here)

SCImago Journal & Country Rank

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved
.