Bimonthly    Since 1986
ISSN 1004-9037
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
   
      07 April 2023, Volume 38 Issue 2   
    Article

    SPECIFY UNDERLINING DISTRIBUTION FOR CLUSTERING LINEARLY SEPARABLE DATA: NORMAL AND UNIFORM DISTRIBUTION CASE
    Farag Hamad1*, Najiah Younus2, Mohamad M.A. Muftah2, and Mohamad Jaber3
    Journal of Data Acquisition and Processing, 2023, 38 (2): 4675-4684 . 

    Abstract

    Clustering is one of the useful methods that we use to classify data. There are diverse statistical methods that can be used to divide the data into different groups. Cluster analysis is performed to discover distinct individuals that share the same common features within a large population. The observations within the same group have similar features from one to another and are different from observations in other groups. Eventually, clusters are classified by each group to determine which individual belongs to what group. In the past decades, clustering has been increasingly used in data analysis and data mining. Several clustering methods have been developed for grouping the data that share common features. k-means clustering: k-means, k-means++, and kernel k-means are the most important statistical tools used for clustering data. These methods are performed to classify the linearly separable data. Moreover, there are different ways to classify the data by assigning underline distributions to the data. In this paper, different distributions have been assigned as underline distributions for clustering the data. Clustering simulation data can be accomplished by assigning a normal or uniform distribution, as was done in this study. In order to see the improvement for each method, we assigned two different distributions (normal and uniform distributions) to classify linearly separable simulated data. The results were compared with the k-means method and with the ground truth of the data. The study found improvements in clustering when using a uniform density function. Moreover, a lower overlap percentage was found when we used the uniform density function for clustering the data. Using a significance test, there is no significant difference found between the estimated cluster mean and the cluster underlying mean. In addition, the proposed methods perform well when the sample size is larger.

    Keyword

    Probability density,k-means algorithm, linearly separable data, normal, and uniform distribution.


    PDF Download (click here)

SCImago Journal & Country Rank

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved