Journal of Data Acquisition and Processing

1 Jan 2023, Volume 38 Issue 1

Article

1.	HAZARD IDENTIFICATION AND DETECTION USING MACHINE LEARNING B S Panda1, Dabbiru Chaturya2, Injeti Gautham Sahil3, Chalapaka Dinesh4, Annam Bhanu Prakash5 Journal of Data Acquisition and Processing, 2023, 38 (1): 4418-4427 .

Abstract

In the present day, web browsing has taken on significant importance in our daily lives. Nevertheless, this ease carries the risk of coming across malicious websites that could infect our devices with malware and steal our personal data. The present cybersecurity techniques, such firewalls and antivirus software, usually fall short of protecting us from these ever-evolving threats. As a result, a more sophisticated and useful model is needed that can accurately distinguish between safe and harmful online pages. This motivates us to develop a new clssification system that utilizes a range of machine learning classification algorithms, such as Adaboost, XGBoost, Random Forest, Support Vector Machine, Naive Bayes, and Logistic Regression, Decision Tree, K Nearest Neighbors, ANN, and Gradient Boosting in addition to analyzing and detecting URL-based features. Researchers have identified several machine learning classifiers, including Adaboost, XGBoost, and Support Vector Machine, that are effective in detecting malicious websites. In order to improve user online security, our aim is to develop a system that can accurately identify a web page's malicious intent. To achieve this, we will extract relevant attributes from web pages and train the classifiers using bagging and boosting methods. Our approach will be put to the test on a big dataset of web pages, and its efficacy will be compared to that of other approaches. Our results show how effectively our categorization algorithm can identify potentially harmful websites. Our proposed methodology can also significantly improve web security while protecting users from harm by providing a rapid and accurate way to discover and analyze hazardous websites. The findings of this study have wide-ranging implications that might have an impact.

Keyword

Malicious websites, Adaboost, XGBoost, Random Forest, SVM, Logistic Regression, ANN, Gradient Boosting.

PDF Download (click here)