叢聚分析的研究:改良穩定度的量測 面對分群問題,目前已存在相當多的技術及演算法來嘗試解析未標籤資料 的潛在結構。然而,如何驗證分群結果仍是個大難題,更衍生出許多相關議 題。舉例而言:該選擇哪種分群方法?該使用何種距離、相似度度量方法來 比較資料?該分成幾群? 本研究著重於探討分群穩定度的議題,藉由計算在原始資料中加入不穩 定資料點後的預期分群距離來檢驗分群穩定度。此方法常用於找出最佳的分 群數。然而,當資料量過大時,此方法會不易於實現,更會影響度量之效 能。為解決此問題,本研究改變以往使用隨機加入不穩定資料點的方式,提 出Critical Stability演算法,著重於最容易破壞演算法分群的關鍵資料點,以改 善度量穩定度方法之效能。對於驗證Critical Stability演算法,本研究採用真實 資料集與著名的人工測試資料集,比較傳統演算法與Critical Stability演算法執 行結果與所耗費的時間。
A Study on Cluster Analysis: Improving Performance on Stability Measurement Several techniques and algorithms have been designed for clustering. Given that the results of these techniques come from describing the hidden structures of unlabeled data, the validation of the output and method thus becomes a hard task. This raises several problems, some of which have still not been solved. As examples, consider the following: Which clustering method should we use? Which distance or similarity method should we choose to compare data? How should we assess the significance of the cluster? How can we determine the number of clusters? Several proposals to address these problems have been proposed. In this research, we focus on the stability score, which is measured by calculating the expected distance of a perturbed version of the original data. This is commonly used to know the number of clusters in a dataset, however this method becomes difficult to run when the data size increases. Calculating on large data also affects the performance of the measurements. Depending on the nature of the data can take several weeks to get a precise result. To address this issue we present a variation of the stability algorithm named “Critical Stability”, that focuses on the main perturbations that can destroy patterns, replacing the randomly generated ones thus improving the performance of the measurement. To validate this new algorithm, we tested real and artificial datasets that have known patterns and compared time and results for both the stability algorithm and the critical stability.
為了持續優化網站功能與使用者體驗,本網站將Cookies分析技術用於網站營運、分析和個人化服務之目的。
若您繼續瀏覽本網站,即表示您同意本網站使用Cookies。