|
09 May 2023, Volume 38 Issue 3
|
|
|
Abstract
Stress is a common response to environmental and psychological factors, negatively impacting mental and physical health. Analyzing stress data with multiple features can reveal contributing factors and aid in developing effective stress management strategies. However, the large dimensionality poses challenges due to many features, leading to overfitting. Feature selection is crucial in mitigating this issue and improving machine learning model performance on stress data. This paper proposes a high-level ensemble feature selection (HLE-FS) algorithm for stress data. The algorithm aims to identify the most informative features relevant to stress classification, which can lead to a better understanding of the underlying factors contributing to stress and more accurate stress prediction. The proposed algorithm consists of several steps to preprocess the input stress data and apply different feature selection techniques. First, missing values in the data are imputed using hybrid imputation, and categorical variables are converted to numerical using categorical feature target encoding. The data is then normalized to ensure compatibility with machine learning algorithms. The algorithm applies three feature selection techniques in an ensemble approach, including filter-based, wrapper-based, and embedding-based methods. The filter-based feature selection technique uses information gain and ranker search to rank the features. The wrapper-based technique employs Naïve Bayes classifier and Greedy Stepwise search with ThreadPoolExecutor to search for the best feature subsets using a wrapper approach. Finally, the embedding-based technique uses Principal Component Analysis (PCA) to reduce the dimensionality of the data, and Ranker search to rank the PCA-derived features. The results of the three feature selection techniques are combined using a majority voting mechanism, and the top-k features are extracted from the combined results. The algorithm then evaluates the performance of the dataset with and without feature selection using a Random Forestclassifier. Experimental results on stress data demonstrate that the proposed algorithm outperforms the existing system regardingthe accuracy and computational efficiency. The algorithm effectively selects the most informative features from the input stress data, improving stress classification performance.
Keyword
HLE-FS, Naïve Bayes, ThreadPoolExecutor, PCA.
PDF Download (click here)
|