利用PPR於非參數時間模型分群之研究

近幾年來時間序列資料分群研究，已經被廣泛的討論與應用。在參數化模型下，過去的文獻中，假設相同群組的資料來自一組相同的時間序列模型，然而這些參數化模型的假設下，往往可能會因為假設的不正確或限制太多，使得後續的分群無法得到較高的正確率。後續時間序列資料分群研究，為改善參數化模型的非線性及非平穩性假設的問題，利用非參數化模型分群法。本研究利用投影追蹤迴歸(Projection Pursuit Regression)對於非參數自我迴歸模型進行估計，預測並進一步使用於分群分析。此方法除了可以改善參數化假設問題外，亦可解決其它非參數化模型分群法及非參數可加性模型法無法涵蓋的交乘作用模型。在以往非參數時間序列模型研究，對於生物基因上應用考慮到均為時間，但若在經濟上一般的時間序資料，前期的資訊對於資料的分群分析會較時間變數更為合理及直接相關。考慮兩兩時間序列資料，本研究基本的想法乃利用個別PPR模型以預測對方的觀測值，同時定義出兩預測模型及時間序列資料間的值，直覺地當有較小的值，應為同一群。在模擬方面，PPR對時間序列資料分群有不錯的分群相似度。對線性時間序列資料其相似度約為0.67~0.98；對非線性時間序列資料，相似度約為0.70~1；而對有交乘作用時間序列資料相似度約為0.98~1。在模擬分析中，PPR對有交乘作用時間的時間序列資料有較佳的分群相似度。在與其它傳統方法比較時，當資料為線性時，分群結果是沒有差異的，當資料為非線性或有交乘項時，明顯PPR分群法相對於其它傳統方法優。在實證上應用於美國25州個人平均所得資料，將其分為二群相似度約為0.81，此分群結果仍超越現行的一些方法的分析。

關鍵字

投影追蹤迴歸；模型基礎分群；非參數自我迴歸模型；分群相似度測量

並列摘要

In recent years, time series clustering has provided promising results in a variety of applications. Model based and distance based are two main classes of time series clustering, where the former suppose the same data of group come from a group of the same time series models while the latter uses the variant defined distance to deal with clustering. For more specific, the model based clustering methods are derived under the proposed assumptions, e.g., constant variance, stationarity, linearity, normality and additive. However, in practice, some of the mentioned assumptions may invalid and cause incorrect clustering result. To overcome this problem, this study uses the project pursuit regression (PPR) method for the nonparametric functional form estimation. We note that this method use little assumption and hence can be applied even in non-stationary or non-additive time series clustering. In the last decade, some nonparametric time series model were utilized in the biological setup by considering the mean function as a function of time. However, it would be more reasonable to conduct the former lags information rather than time index into the unknown functional. The basic ideas of this paper are: (1) Use the PPR method by adopting the former lags information to fit the model for each cluster. (2) Intuitively, two fitted model are similar should have similar predictor ability; hence, we can use this in the cluster procedure. In the simulations, we find that PPR for variant types of time series data have great similarity of cluster result. For the linear auto-regressive time series model data the PPR clustering methods have similarity values range from 0.67~0.98. For the nonlinear data the PPR clustering methods have similarity value range from 0.70~1. For the interaction time series data the PPR clustering methods have similarity value range from 0.98~1. The proposed method is also used to categorize the personal income data set, which is a collection of 25 time series representing the per capital personal income during 1929-1999 in 25 states of the USA. The case analysis results show the PPR clustering method compares favorably with other methods proposed previously by others for similar time series clustering tasks.

並列關鍵字

PPR；model-based clustering；nonparametric autoregressive models；similarity measures

參考文獻

De Boor,C., 1978 A Practical Guide to Splines. Springer.

Dempster, A.P., Laird, N. and Rubin, D.B., 1976 Maximum likelihood estimation from incomplete data using the EM algorithm (with discussion). J. Roy. Statist. Soc. Series B, 39, 1–38.

Friedman, J. H. and W., 1981 Stuetzle,“Projection Pursuit Regression,”Journal of the

American Statistical Association, Vol. 76, pp. 817-823.

Han, J. and Kamber, M., 2000. Data Mining: Concepts and Techniques, Morgan Kaufmann.

被引用紀錄

林秉政（2008）。非穩定型時間序列分群與應用〔碩士論文，國立臺北大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0023-0408200813083800

國際替代計量

利用PPR於非參數時間模型分群之研究

未授權

主題瀏覽