透過您的圖書館登入
IP:18.221.154.151
  • 學位論文

一個估計資料群數的新方法

A new method for estimating the number of clusters

指導教授 : 銀慶剛
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


估計資料群數是群集分析(cluster analysis)中一個重要的問題。在本篇論文中,我們嘗試模型選取中最被普遍使用的貝氏訊息準則(Bayesian information criterion)做為群集問題中選取群數的標準。然而,在資料變數為一維的情況下,我們發現使用BIC會高估資料的真實群數;即使嘗試各種不同的懲罰項,並沒有找到一個有效的一致性訊息準則(consistent information criterion)。因此,本篇論文提出了一個群數估計的新方法,並經由程式模擬說明其估計資料群數的準確性。

並列摘要


A major problem in cluster analysis is to find the number of clusters. In this paper, we try to use Bayesian information criterion(BIC), a wide-used criterion in model selection problem, as a criterion to estimate the number of clusters. However, we found that the ture number of clusters would be overestimated when using BIC as a criterion in one dimension case. We can not find a consistent information criterion in the problem of number estimation. We propose a new method for estimating the number of clusters and show the currency of the method via simulation study.

參考文獻


[1] Calinski, R. B. and Harabasz, J. A.(1974). A denrite method for cluster analysis. Communications in Statistics 3, 1-27.
[2] Hartigan, J. A.(1975). Clustering Algorithms. Wiley.
[3] Kaufman, L. and Rousseeuw, P.(1990). Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley.
[4] Krzanowski, W. J. and Lai, Y. T.(1985). A criterion for determining the number of clusters in a data set. Biometrics 44, 23-34.
[5] Milligan, G. W. and Cooper, M. C.(1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159-179

延伸閱讀