Bimonthly    Since 1986
ISSN 1004-9037
Publication Details
Edited by: Editorial Board of Journal of Data Acquisition and Processing
P.O. Box 2704, Beijing 100190, P.R. China
Sponsored by: Institute of Computing Technology, CAS & China Computer Federation
Undertaken by: Institute of Computing Technology, CAS
Published by: SCIENCE PRESS, BEIJING, CHINA
Distributed by:
China: All Local Post Offices
 
   
      07 April 2023, Volume 38 Issue 2   
    Article

    A NOVEL STUDY OF SILHOUETTE METHOD TO SOLVE THE ISSUES OF OUTLIER AND IMPROVE THE QUALITY OF CLUSTER
    Abdulnassar. A. A, Latha . R . Nair
    Journal of Data Acquisition and Processing, 2023, 38 (2): 3099-3118 . 

    Abstract

    The silhouette method is a famous statistical method to find the cluster count value as well as to solve the issues of outliers in the sample space. An outlier is a data object that deviates significantly from the rest of objects. Silhouette coefficient value of a sample is a clear indication of the outlier in the data set. This study aims to improve the cluster quality by detecting and removing the outlier using different cluster methods. The value can be used to determine the compactness of formed clusters. In partition methods, the cluster results are very sensitive to the cluster count value we select. The performance of the Silhouette method is analysed with different data sets from UCI data repository. We propose two methods to detect and remove outliers. One method uses the silhouette value of sample and the other method measures the distances of sample with all cluster centroids and decide the sample as outliers based on a threshold distance. We have implemented methods in Python and the results are checked using different data sets from UCI and large public data sets. The performances of the cluster quality are checked using the cluster evaluation indexes such as Silhouette, Dunn, DB and C indexes. The removal of outliers improves the quality and compactness of the newly formed cluster. Analysis is done to study performance as well as cluster efficiency by removing the outlier from the sample space.

    Keyword

    Cluster Compactness, Data Mining, Dunn index, Kmeans. Outlier, Partition Algorithm and Silhouette.


    PDF Download (click here)

SCImago Journal & Country Rank

ISSN 1004-9037

         

Home
Editorial Board
Author Guidelines
Subscription
Journal of Data Acquisition and Processing
Institute of Computing Technology, Chinese Academy of Sciences
P.O. Box 2704, Beijing 100190 P.R. China
E-mail: info@sjcjycl.cn
 
  Copyright ©2015 JCST, All Rights Reserved