Bimonthly    Since 1986
ISSN 1004-9037
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
   
      09 May 2023, Volume 38 Issue 3
    Article

    A HIGH-LEVEL ENSEMBLE FEATURE SELECTION ALGORITHM FOR MITIGATING THE DIMENSIONALITY IN STRESS DATA
    Suryavanshee Prashant Maharudra and Dr. Sharanbasappa Gandage
    Journal of Data Acquisition and Processing, 2023, 38 (3): 1064-1085 . 

    Abstract

    Stress is a common response to environmental and psychological factors, negatively impacting mental and physical health. Analyzing stress data with multiple features can reveal contributing factors and aid in developing effective stress management strategies. However, the large dimensionality poses challenges due to many features, leading to overfitting. Feature selection is crucial in mitigating this issue and improving machine learning model performance on stress data. This paper proposes a high-level ensemble feature selection (HLE-FS) algorithm for stress data. The algorithm aims to identify the most informative features relevant to stress classification, which can lead to a better understanding of the underlying factors contributing to stress and more accurate stress prediction. The proposed algorithm consists of several steps to preprocess the input stress data and apply different feature selection techniques. First, missing values in the data are imputed using hybrid imputation, and categorical variables are converted to numerical using categorical feature target encoding. The data is then normalized to ensure compatibility with machine learning algorithms. The algorithm applies three feature selection techniques in an ensemble approach, including filter-based, wrapper-based, and embedding-based methods. The filter-based feature selection technique uses information gain and ranker search to rank the features. The wrapper-based technique employs Naïve Bayes classifier and Greedy Stepwise search with ThreadPoolExecutor to search for the best feature subsets using a wrapper approach. Finally, the embedding-based technique uses Principal Component Analysis (PCA) to reduce the dimensionality of the data, and Ranker search to rank the PCA-derived features. The results of the three feature selection techniques are combined using a majority voting mechanism, and the top-k features are extracted from the combined results. The algorithm then evaluates the performance of the dataset with and without feature selection using a Random Forestclassifier. Experimental results on stress data demonstrate that the proposed algorithm outperforms the existing system regardingthe accuracy and computational efficiency. The algorithm effectively selects the most informative features from the input stress data, improving stress classification performance.

    Keyword

    HLE-FS, Naïve Bayes, ThreadPoolExecutor, PCA.


    PDF Download (click here)

SCImago Journal & Country Rank

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved