The Analysis and Application of Nonstationary Time Series Clustering

指導教授 : 林財川


對時間序列資料分群方法,概括分為模型基礎及非模型基礎, 在非模型基礎分群法大多以"距離"來決定相似度,此類型方法卻容易遭受到序列間長度不同,及觀察序列時間起始點等問題。 在模型基礎方群法又分為機率模型及統計模型兩種,機率模型是以給定模型參數的先驗分配, 再利用貝氏定理求得最高分群的後驗機率,如隱藏式馬可夫方法, 貝氏機率分群法;統計模型方面則有自我迴歸模型及自我回歸移動平均模型, 依此配適出合適資料的模型,再度量模型間的相似度,例如利用模型參數的距離、離散型傅立葉轉換、 離散型小波轉換和倒頻譜線性預測等,然而目前此類分群法卻需要平穩性及線性兩個前提假設, 這造成應用上許多窒礙,為了免除這類假設帶來的限制,非參數模型近年開始有蓬勃的發展, 原因資料可能由非線性或非平穩的隨機過程產生,以參數化模型來配適就已造成嚴重的誤差,進一步再作前述分群法結果定是差強人意。 而本論文主要於探討具趨勢或循環項等非穩定型時間分群研究法,例如人類生長激素和生理的變化都存在"日變節奏" 的趨勢,基本的方法以利用非參數函數模型來研究資料若是存在著共同的週期性;進而對該資料進行配適模型及分群。 本研究精神在於序列間在共同曲線下推求,希冀模型參數能成為該序列重要的特徵表現, 以歐式距離、不等測量與F檢定的統計模型基礎分群法實行分群後,最後視分群水準的好壞來檢驗本研究的模型是否優於參數模型。 本文架構在第二章的部份會列舉將與本研究方法比較的方法,並稍加說明該方法背景; 第三章介紹自我迴歸模型與非參數模型的簡介與建立;第四章介紹三種分群法包含歐式距離、不等測量、F檢定; 第五章為模擬實驗;第六章取自道瓊工業指數中的股價為實證例子,並試圖找出不同類股間是否有差異,即該類股的走勢;第七章結論。


The cluster analysis of time series data can be divide into two groups, model based and non-model based. The cluster analysis of non-model based usually uses "distance" to define the similarity between any two data , but it still has some problems , like the different length of data and time shift of data. The cluster analysis of model based also can be divide into tow groups,statistic model and probability model. In probability model we have given the prior probability of parameter and cite Baysian theorem to calculate the posterior probability,for example,Hidden Markov Method and Bayesian clustering by dynamics. We often take autoregressive model and autoregressive moving average model to be statistic model research. We suppose the data is an statistic model and fit the data appropriately then we can measure the similarity between statistic models.By the way, we usually measure the similarity by calculate the Euclidean distance, DFT, DWT and LPC, but the data must be restricted to linearity and stationary.To avoid the restriction of linearity and stationary, non-parametric model is coming popular in cluster analysis.As general, the data is possibly generated by non-linearity or non-stationary process and clustering errors will be happened seriously if we fit this kind of data into a parametric model. Our research major in the clustering of the periodic data which is like Human Circadian Rhythms.Our research is suppose there is a common curve between many series and estimate the mean, the amplification and the time shift,respectively. Take this three parameter to be the characteristics of the data and use Euclidean distance, disparity measure and F-test to clustering the compare the clustering level with parametric model. The structure of our paper, we will introduce the clustering motheds in the past research at chapter two; at chapter three we build the parametric model and non-parametric function model and explain how to estimate the parameter in the model;at chapter four we introduce Euclidean distance, disparity measure and F-test in detail;there is simulation at chapter five;at chapter six we use the stock data which is from The Dow Jones Industrial Index to clustering.We try to find out different industrial groups have different trend; at chapter seven we drop a conclusion to our research.


