Abstract
The primary goal of the current work is to develop a heterogeneous ensemble model for the diagnosis of diabetes in patients using machine learning techniques. The problem of class imbalance is addressed by the proposed paradigm. Various sampling methods, like up-sampling, down-sampling, and the synthetic minority oversampling technique(SMOTE) are used to address the class imbalance problem. Different feature selection techniques, including Ranksum, Univariate Principle Component Analysis (PCA), Logistic Regression (ULOGR), Cross-Correlation Analysis (CRA), Gini Score, and Information Gain (IGFR) are used to identify the relevant features once the preprocessed data is retrieved. On the PIMA dataset, a variety of classification methods, notably LR, SVM, Naive Bayes, Bagging,Adaboost, and PNN are used to classify and predict if a sample is diabetic or not. The results showed that the MVE ensemble learning method combined with SMOTE sampled data yields the maximum performance with 95.81% accuracy and 0.94 as AUC.
Keyword
Machine Learning, SVM, Adaboost , NB, SMOTE
PDF Download (click here)
|