Abstract
The text data is unstructured. The amount of textual data available is excessive and continues to increase daily. The technique of shortening long documents into brief paragraphs or phrases is known as text summarization. The method ensures that the meaning of the paragraph is constant while also extracting crucial information. The two main goals of a text summarization are optimal topic inclusion and excellent readability. Extractive summarization methods emphasize identifying significant sentences from the document. The identification of important sentences is based on the sentence score. Most of the extractive summarization methods fall under one of the following categories. Graph-based methods, TF-IDF-based methods, Fuzzy Logic based methods, and Machine Learning methods. The key issue with all of these methods is that they only examine local knowledge or data that can only be found in a particular file. Document domain knowledge is not taken into consideration. The Domain Expert Summarization (DES) method is developed in this research to summarize a document like an SME, and the effectiveness of the DES method is assessed in comparison to other state-of-the-art works. To make a machine dexterously Domain Knowledge must be obtained. So that key points or keywords of the domain can be easily identified and a summary can be produced with all key points. The LDA topic modelling method is used to obtain domain knowledge. The experiment makes use of the BBC NEWS and NEWS Aggregator data set. Evaluation is done using ROUGE-1, ROUGE-2, and ROUGE- L measures. The experiment showed that, in terms of the ROUGE Score, the suggested DES method outperforms the current state-of-the-art methods. A statistical t-test is performed at a 5% significance level. The p values for ROUGE 1, ROUGE 2, and ROUGE L on the BBC NEWS dataset are 0.002174, 0.002174, and 0.001905. The p values for ROUGE 1, ROUGE 2, and ROUGE L on the NEWS Aggregator dataset are 0.032835, 0.014387, and 0.025124, respectively. It is clearly evident from the P values that the suggested DES Method is statistically significant.
Keyword
Text Summarization, LHUN, Lex Rank, Tex Rank, LSA, LDA
PDF Download (click here)
|