Cluster Analysis and Realization of Mixed Data

Mixed data is often generated in daily production and life. In order to improve the efficiency of data mining, it is necessary to establish a cluster analysis method for mixed data. We introduced the details of the calculation method of the comprehensive distance for the mixed data, the choice of the number of clusters and the choice of the clustering method. In the empirical analysis, we select the mixed data set, use ＂gower＂ distance function to calculate the distance of the mixed data first, then select the appropriate number of clusters according to the size of the average silhouette width, finally use the PAM and CLARA algorithm to realize the cluster analysis of the mixed data. We find that the clustering results of PAM algorithm and CLARA algorithm are different, and the clustering results of CLARA algorithm perform better.

關鍵字

Mixed Data ； Cluster Analysis ； PAM algorithm ； CLARA algorithm ； R

參考文獻

Park H S, Jun C H. A simple and fast algorithm for K-medoids clustering[J]. Expert Systems with Applications, 2009, 36(2):3336-3341.

Kaufman L, Rousseeuw P J. 3. Clustering Large Applications (Program CLARA) [M]// Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, Inc. 2008:126-163.

Binhui Wang. Multivariate statistical analysis and R language modeling[M]. Jinan University Press, 2016.

Google Scholar

Dodge Y. Statistical Data Analysis Based on the L1-Norm and Related Methods[M]. North-Holland, 1987.

Google Scholar

Ng R T, Han J. Efficient and Effective Clustering Methods for Spatial Data Mining [M]. University of British Columbia, 1994.

Google Scholar

國際替代計量

Cluster Analysis and Realization of Mixed Data

全文下載

主題瀏覽