透過您的圖書館登入
IP:18.188.61.223
  • 學位論文

時間序列資料庫之封閉性樣式探勘

Mining Closed Patterns in Time-Series Databases

指導教授 : 李瑞庭
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近年來,探勘封閉性樣式已成為知識探索領域中重要的研究議題,其主要目的在於找出隱藏在大量資料中具有代表性的樣式。本論文針對如何從時間序列資料庫中探勘封閉性樣式,提出了三個有效率的演算法,分別為CMP (Closed Multi-sequence Patterns mining)、 CFP (Closed Flexible Patterns mining)與CNP (multi-resolution Closed Numerical Patterns mining)。 CMP演算法主要著重於多時間序列資料庫中樣式的分析,而CFP演算法可從時間序列資料庫中探勘具「彈性間隔」的封閉性樣式。CMP與CFP演算法先將時間序列轉換成符號序列,然後再探勘封閉性樣式。將時間序列轉換成符號序列可減少雜訊並簡化探勘程序,然而可能導致樣式遺失或差異很大的序列卻支持相同樣式的問題。 為避免將時間序列轉換到符號序列所造成的問題,CNP演算法可從多時間序列資料庫中直接找出具有代表性的樣式。此外,CNP演算法加入多層解析度 (multi-resolution)的概念以找出封閉性樣式,可讓使用者以不同的觀點來檢視資料。 上述三個演算法皆以深度優先探勘的方式,並配合投影資料庫以減少搜尋空間,除了加速探勘的過程外,透過有效的修剪策略及檢查封閉性的機制,可避免產生不必要的候選樣式。 本研究結果顯示CMP演算法比改良式Apriori以及BIDE演算法快了數十倍; CFP演算法比改良式Apriori演算法更有效率; CNP演算法執行速度亦較改良式A-Close演算法快。

並列摘要


Closed pattern mining is a critical research issue in the area of knowledge discovery and data mining with the aim of discovering interesting patterns hidden in a large amount of data. In this dissertation, we propose three algorithms, called CMP (Closed Multi-sequence Patterns mining), CFP (Closed Flexible Patterns mining), and CNP (multi-resolution Closed Numerical Patterns mining) to solve various issues extended from the problem of mining closed patterns. The CMP algorithm is designed to find closed patterns in a multi-sequence time-series database. The CFP algorithm is developed to solve the problem of mining closed flexible patterns in a time-series database. Both the CMP and CFP algorithms involve a transformation of time-series sequences into symbolic sequences in the first phase. Although analyzing on symbolic sequences is ideal to reduce the effect of noises and ease the mining process, these approaches may lead to pattern lost and the sequences supporting the same pattern may look quite different. To overcome the problem raised in symbolic sequence analysis, the CNP algorithm is proposed to mine closed patterns without any transformation from time-series sequences to symbolic sequences. The method also employs the Haar wavelet transform to discover patterns in the multiple resolutions in order to provide different perspectives on datasets. All the proposed algorithms have employed the concept of projected databases to localize the pattern extension that leads to a significant runtime improvement. Moreover, effective closure checking schemes and pruning strategies are devised respectively in each of the proposed algorithms to avoid generating redundant candidates. The experimental results show that the CMP algorithm significantly outperforms the modified Apriori and BIDE algorithms. The CFP algorithm achieves better performance than the modified Apriori algorithm in all cases. And, the CNP algorithm has demonstrated a significant runtime improvement in comparison to the modified A-Close algorithm.

參考文獻


[1] R. Agrawal, K. Lin, H. S. Sawhney, K. Shim, Fast similarity search in the presence of noise, scaling, and translation in time-series databases, in: Proceedings of the 21th International Conference on Very Large Data Bases, 1995, pp. 490-501.
[2] C. D. Ahrens, Meteorology today: an introduction to weather, climate, and the environment (8th ed.), Thomson Brooks/Cole, Belmont, 2007.
[8] L. Chang, T. Wang, D. Yang, H. Luan, SeqStream: mining closed sequential patterns over stream sliding windows, in: Proceedings of the 8th IEEE International Conference on Data Mining, 2008, pp. 83-92.
[9] H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, M. Chau, Crime data mining: a general framework and some examples, IEEE Computer 37 (4) (2004) 50-56.
[10] T. S. Chen, S. C. Hsu, Mining frequent tree-like patterns in large datasets, Data and Knowledge Engineering 62 (1) (2007) 65-83.

延伸閱讀