時間序列資料庫中多重解析度頻繁樣式之資料探勘

近年來，時間序列特性的資料以蓬勃的速度被廣泛地應用在各個領域中，例如財務資料分析、網路流量分析或科學數據的處理等等。從時間序列資料庫中找尋不同解析度的頻繁樣式，可以幫助科學家或是財務分析師判斷發展趨勢與獲得有價值的資訊。因此，在本篇論文中，我們提出一個有效率的探勘演算法叫做「MFP-Miner」。可以從時間序列資料庫中，找尋不同解析度的頻繁樣式。我們所提出的演算法主要包括三個階段。首先，我們將資料庫由高解析度轉換為低解析度。然後，我們從轉換後的資料庫中找出所有長度為1的頻繁樣式並建立其映射資料庫。最後，我們利用頻繁樣式樹以深先搜尋法的方式遞迴產生所有的頻繁樣式，並列舉出在高解析度資料庫中所有的頻繁樣式。在探勘過程中，MFP-Miner利用映射資料庫來計算支持度並使用有效的修剪策略刪除不必要的候選樣式，所以可以有效率地從時間序列的資料庫中，找出所有不同解析度的頻繁樣式。實驗結果顯示，不論在合成資料或是真實資料中，我們所提出的方法皆比改良式的Apriori演算法更有效率、更具擴充性。

關鍵字

資料探勘；時間序列資料庫；頻繁性樣式

並列摘要

Time series data have been generated at an unprecedented speed from almost every application domain in the last decade, e.g., financial data analysis, network traffic analysis, scientific data processing, etc. Mining multi-resolution frequent patterns in time series databases can help scientists or financial analysts analyze the trends of data and obtain valuable information. Therefore, in this thesis, we propose an efficient algorithm, MFP-Miner (Mining Frequent Patterns Miner), to mine multi-resolution frequent patterns in time-series databases. Our proposed method consists of three phases. First, we transform the original database into a database in the low resolution and obtain the transformed database. Second, we find frequent 1-patterns from the transformed database and construct a projected database for each frequent 1-pattern found. Third, we recursively generate frequent patterns by a frequent pattern tree in a depth-first search manner and enumerate all frequent patterns in the original database. Since the MFP-Miner employs projected databases to localize the support counting and pattern mining, and utilizes effective pruning strategies to remove unnecessary candidates during the mining process, it can efficiently mine all multi-resolution frequent patterns in time-series databases. The experiment results show that the proposed method is more efficient and scalable than the Apriori modified.

並列關鍵字

data mining ； time series database ； frequent patterns

參考文獻

[1] R. Agrawal and R. Srikant, Mining sequential patterns, Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, 1995, pp. 3-14.

[6] M.J. Zaki, SPADE: An efficient algorithm for mining frequent sequences, Machine Learning, Vol. 42, No. 1-2, 2001, pp. 31-60.

[7] M.Y. Lin, S.Y. Lee, and S.S. Wang, DELISP: Efficient discovery of generalized sequential patterns by delimited pattern-growth technology, Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Taipei, Taiwan, 2002, pp. 189-209.

[8] Y. L. Chen and Y.H. Hu, Constraint-based sequential pattern mining: the consideration of recency and compactness, Decision Support Systems, Vol. 42, No. 2, 2006, pp. 1203-1215.

[9] F. Gianotti, M. Nanni, and D. Pedreschi, Efficient mining of temporally annotated sequences, Proceedings of the 6th SIAM International Conference on Data Mining, 2006, pp. 346-357.

國際替代計量

時間序列資料庫中多重解析度頻繁樣式之資料探勘

全文下載

主題瀏覽