Abstract
In a number of countries worldwide in normal communication people use offensive languages in reality both online and offline. But, all the abusive conversation between two parties is hate speech, it is the subject of investigation. Therefore, in this paper, the key area of study is the differentiation between hate speech and offensive language. The paper includes three parts of the work: the first study of the recent development in classifying hate speech in social media, the Second, proposed an algorithm for classifying hate speech text from normal and offensive language text, and the third provides an algorithm to identify the source of hate spreader. Therefore, first, a review of recent literature has been carried out which is divided into the review and surveys, hate speech classification as binary classification, and hate speech detection as a multi-class classification problem. Then a model for hate speech classification has been proposed, which includes the data pre-processing, natural language processing (NLP), and Term Frequency-Inverse Document Frequency (TF-IDF) based feature extraction. The features are used to train a 2D-Convolutional Neural Network (CNN) and Support Vector Machine (SVM) model. Finally, an algorithm is proposed to identify the source of hate spreader. The dataset available on Kaggle for hate speech, offensive language, and normal text is used for experimental analysis. According to finding social media text only with the NLP features are not providing good accuracy. On the other hand, only TF-IDF-based features demonstrate higher accuracy as compared to NLP-based features. Additionally, a combination of both features is providing more accurate results as compared to individual techniques.
Keyword
Hate speech detection, Offensive language, Text mining, Natural language processing, Deep Learning.
PDF Download (click here)
|