透過您的圖書館登入
IP:3.139.105.83
  • 學位論文

利用函數型主成分計分及平均曲線對函數型資料進行k均值法分群之探討

Study effectiveness of k-means clustering of functional data: functional principal component scores feature and mean curve feature

指導教授 : 陳宏
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


集群分析旨在將資料分為數個相異性較大的群組,使組內的相似程度高,是分析高維度資料及大型資料庫的重要資料探勘工具之一;藉著經集群分析後的資料,可更容易的探索組內成員和有興趣的變量之間的關係。應用集群分析於高維度資料前,往往會先降低資料的維度,而以不同觀點去做資料降維,可能會使得到的結論有所不同。 本論文的研究主題為探討對函數型資料(functional data)之觀測對象分群的問題,在文獻中(Abraham, 2003),從平均函數(mean function)角度出發對資料做降維,再以k均值法對降維後的資料做分群。 在2008年Peng 和 Muller的文章中,在所有的曲線有相同的平均函數的假設之下,利用有限維度的函數型主成份分數 (functional principal component scores) 之分佈來探查資料的分群。然而,無論是以平均函數或是共變異數函數 (covariance function)為出發點對資料做降維,所得到的群集都反映出平均函數的特性。 這個現象引發了我們試圖針對這兩個方法的效用提出一套理論分析。在本文中,我們將提出說明在某些狀況下,從共變異數函數為出發點將會降低分群品質之效力。在2007年Chiou和Li的文章中提出一套以疊代重分群為主的分群演算法,在初步分群方面,主要是利用有限維度的函數型主成份分數之分佈來探查資料在平均結構上的初步分群 。依據我們的推論,我們建議在初步分群中,應從平均函數的角度來探查資料的分群。

並列摘要


Organizing functional data into sensible groupings is one of the most fundamental modes of understanding and learning the underlying mechanism generating functional data. Clustering analysis is often employed to search for homogeneous subgroups of individuals in a data set. In Abraham et al. (2003, Scandinavian Journal of Statistics), they start with feature extraction on the mean function and use k-means clustering procedure to determine the clusters. In Peng and Muller (2008, Annals of Applied Statistics), they assume common mean function for all units and start with feature extraction on the covariance function. However, the clusters found by $k$-means clustering procedure can be explained through the characteristics of mean function of each unit. This motivates a theoretical study on comparing the utilities of these two approaches under the settings of densely observed functional data. We will only present the case that the size of clusters is two only. We will present analysis on the lose of efficiency with feature extraction on the covariance function. In Chiou and Li (2007, Journal of the Royal Statistical Society, Series B), they proposed an iterative functional clustering algorithm which apply the method used in Peng and Muller to the initial clustering stage. We advocate to use the mean function in the initial stage. An analysis is provided to support this recommendation.

參考文獻


[2] Ball, G.H. and Hall, D.J. (1967). A clustering technique for summarizing multivariate data. Behavioral Science, 12, 153-155.
[3] Bunea, F, Ivanescu, A.E. and Wegkamp, M. (2011). Adaptive inference for the mean of a Gaussian process in functional data. Journal of the Royal Statistical Society, Ser. B 73, part 3.
[4] Chiou, J.-M. and Li, P.-L. (2007). Functional clustering and identifying substructures of longitudinal data. Journal of the Royal Statistical Society, Ser. B, 69, 679-699.
via subspace projection. Journal of the American Statistical Association,
2006, 223-235 .

延伸閱讀