透過您的圖書館登入
IP:3.149.229.253
  • 期刊

Cluster Analysis and Realization of Mixed Data

摘要


Mixed data is often generated in daily production and life. In order to improve the efficiency of data mining, it is necessary to establish a cluster analysis method for mixed data. We introduced the details of the calculation method of the comprehensive distance for the mixed data, the choice of the number of clusters and the choice of the clustering method. In the empirical analysis, we select the mixed data set, use "gower" distance function to calculate the distance of the mixed data first, then select the appropriate number of clusters according to the size of the average silhouette width, finally use the PAM and CLARA algorithm to realize the cluster analysis of the mixed data. We find that the clustering results of PAM algorithm and CLARA algorithm are different, and the clustering results of CLARA algorithm perform better.

參考文獻


Park H S, Jun C H. A simple and fast algorithm for K-medoids clustering[J]. Expert Systems with Applications, 2009, 36(2):3336-3341.
Kaufman L, Rousseeuw P J. 3. Clustering Large Applications (Program CLARA) [M]// Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Inc. 2008:126-163.
Binhui Wang. Multivariate statistical analysis and R language modeling[M]. Jinan University Press, 2016.
Dodge Y. Statistical Data Analysis Based on the L1-Norm and Related Methods[M]. North-Holland, 1987.
Ng R T, Han J. Efficient and Effective Clustering Methods for Spatial Data Mining [M]. University of British Columbia, 1994.

延伸閱讀