Comparison of Distance Measures in Cluster Analysis with Dichotomous Data

The current study examines the performance of cluster analysis with dichotomous data using distance measures based on response pattern similarity. In many contexts, such as educational and psychological testing, cluster analysis is a useful means for exploring datasets and identifying underlying groups among individuals. However, standard approaches to cluster analysis assume that the variables used to group observations are continuous in nature. This paper focuses on four methods for calculating distance between individuals using dichotomous data, and the subsequent introduction of these distances to a clustering algorithm such as Ward's. The four methods in question, are potentially useful for practitioners because they are relatively easy to carry out using standard statistical software such as SAS and SPSS, and have been shown to have potential for correctly grouping observations based on dichotomous data. Results of both a simulation study and application to a set of binary survey responses show that three of the four measures behave similarly, and can yield correct cluster recovery rates of between 60% and 90%. Furthermore, these methods were found to work better, in nearly all cases, than using the raw data with Ward's clustering algorithm.

並列關鍵字

Cluster Analysis ； dichotomous Data ； distance measures

被引用紀錄

曾莉雯（2010）。人類心智模式衡量中不同相似係數之評估〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://doi.org/10.6841/NTUT.2010.00017

黃琬琪（2014）。推薦產品之相關性對購物經驗之影響〔碩士論文，國立臺北科技大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0006-1309201417301000

黃偉嘉（2015）。利用群集分析與信號雜音比分類品質屬性〔碩士論文，國立清華大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0016-1908201515571149

國際替代計量

Comparison of Distance Measures in Cluster Analysis with Dichotomous Data

全文下載

主題瀏覽