多尺度資料的二階段分群演算法

群集分析是一種資料探勘的技術，其目的在於發掘資料中可能潛藏的模式，輔助企業行銷與管理，進而提供企業決策。而大部份的分群演算法，將所有的分群資料欄（field）視為等重，忽視欄位間的相對重要性，加上大多的演算法利用『距離』代表群組的相似度，忽視欄位的尺度（scale），導致分析的結果常常過度簡單或難以解釋。而在本研究中，嘗試利用專家的介入，將分群演算法分為兩階段，利用第一階段分群找出適當的參數提供第二階段的分群，進而增加分群的品質，並改進傳統K平均數（k-means algorithm）對於非數值資料的處理，提供了一個改進方式，保有原先資料的尺度進行資料分群。最後並利用四個測試資料(Wisconsin Breast Cancer Data , Contraceptive Method Choice Data , Iris Education Data and Balance Scale Weight & Distance Data)測試之結果發現，多尺度分群法不但合理的計算各種尺度資料，更在分群品質上有所提升，無論資料型態是數值型、非數值型或是混合型，多尺度的計算方式能分離出較大差異的群組，並且促使群組內的物件相似度提高，獲得較佳的分群品質，而專家加權的利用，顯示出多尺度分群法能夠提高物件的預測力與拉大群組中心的距離增加群組的差異度。

關鍵字

資料分群；多尺度分析；專家權重

並列摘要

Cluster analysis is a kind of data mining techniques, and its goal is to find the hidden patterns from the data. In related studies, most of the reseachera use equal weight to cluster data and only use metric calculation to deal with four kinds of scales .We believe traditional clustering algorithm can be incorporated with expert''s subjective judgment. And different scales -- Nominal, Ordinal, Interval and Ratio, should have different methods to calculate the degree of similarity. So we try to combine expert''s weight and multi-scale into clustering process. Our purpose is to solve the problems that clustering result is hard to explain and result can''t meet the decision marker''s need. In this paper, we propose a two-staged clustering algorithm to solve these problems. In the first-staged, we use the training data to find some parameters that can improve our cluster quality. And we cluster all data and these parameters in the second-staged. In our algorithm, we use multi-scales and unequal weight to calculate all kinds of data and use four standard data sets (Wisconsin Breast Cancer Data, Contraceptive Method Choice Data, Iris Education Data and Balance Scale Weight & Distance Data) to test our algorithm. In the end we find better quality of clustering results in using multi-scale and better prediction with expert''s weight, we find two conclusions in our experiments. First, clustering use multiple scale calculation can improve the quality of similarity within group and dissimilarity between groups. Second, clustering with expert''s weight has better prediction than clustering with equal weight. So we believe multi-scales with expert''s weight clustering algorithm can not only improve clustering quality but also meets decision marker''s requirement

並列關鍵字

Clustering Algorithm ； Multi-Scale Analysis ； Expert''s Weight

參考文獻

6. Donald R. Cooper ,Pamela S. Schindler, ”Business Research method 8th ed ”, Mcgraw Hill Publishing,pp 631-635 ,2003

16. Hair, J. ,Anderson, R., Black , W. ” Multivariate Data Analysis,4th ed” , Macmillan publishing ,pp 191-192,1995

15. Gustfson, D.H., Sainfort, F., Johnson , S. W. ,Micheal ,S. ” Measuring Quality of care in Psychiatric Emergencies:Construction and Evaluation of a Bayesian Index ”, Health Service Research, 1992

5. Dice , L. R., ”Measures of the amount of ecologic association between species”, J.Ecology, 1945

7. Dubes, R.C “Cluster analysis and related issues .”in handbook of Pattern Recognition & Computer Vision, 1993

被引用紀錄

楊宗明（2005）。資料探勘在投信業之應用–以共同基金為例〔碩士論文，元智大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0009-0112200611310564

國際替代計量

多尺度資料的二階段分群演算法

主題瀏覽