Bimonthly    Since 1986
ISSN 1004-9037
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
   
      30 Dec 2022, Volume 37 Issue 5   
    Article

    DES: DOMAIN EXPERT SUMMARIZATION USING LDA
    B. Lavanya, U.Vageeswari
    Journal of Data Acquisition and Processing, 2022, 37 (5): 1797-1815 . 

    Abstract

    The text data is unstructured. The amount of textual data available is excessive and continues to increase daily. The technique of shortening long documents into brief paragraphs or phrases is known as text summarization. The method ensures that the meaning of the paragraph is constant while also extracting crucial information. The two main goals of a text summarization are optimal topic inclusion and excellent readability. Extractive summarization methods emphasize identifying significant sentences from the document. The identification of important sentences is based on the sentence score. Most of the extractive summarization methods fall under one of the following categories. Graph-based methods, TF-IDF-based methods, Fuzzy Logic based methods, and Machine Learning methods. The key issue with all of these methods is that they only examine local knowledge or data that can only be found in a particular file. Document domain knowledge is not taken into consideration. The Domain Expert Summarization (DES) method is developed in this research to summarize a document like an SME, and the effectiveness of the DES method is assessed in comparison to other state-of-the-art works. To make a machine dexterously Domain Knowledge must be obtained. So that key points or keywords of the domain can be easily identified and a summary can be produced with all key points. The LDA topic modelling method is used to obtain domain knowledge. The experiment makes use of the BBC NEWS and NEWS Aggregator data set. Evaluation is done using ROUGE-1, ROUGE-2, and ROUGE- L measures. The experiment showed that, in terms of the ROUGE Score, the suggested DES method outperforms the current state-of-the-art methods. A statistical t-test is performed at a 5% significance level. The p values for ROUGE 1, ROUGE 2, and ROUGE L on the BBC NEWS dataset are 0.002174, 0.002174, and 0.001905. The p values for ROUGE 1, ROUGE 2, and ROUGE L on the NEWS Aggregator dataset are 0.032835, 0.014387, and 0.025124, respectively. It is clearly evident from the P values that the suggested DES Method is statistically significant.

    Keyword

    Text Summarization, LHUN, Lex Rank, Tex Rank, LSA, LDA


    PDF Download (click here)

SCImago Journal & Country Rank

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved