透過您的圖書館登入
IP:3.144.25.74
  • 學位論文

時間區間事件之有效率的循序樣式探勘

An efficient interval-based sequential pattern mining

指導教授 : 李素瑛

摘要


現在已經發展的循序樣式探勘演算法皆假設事件的發生是在時間點上。然而,在現實生活上發生的事件通常是持續一段時間的,稱之為“以時間區間為基礎的事件”。但是由於時間區間事件間複雜的關係,造成了在設計有效率以時間區間事件為基礎之循序樣式探勘演算法上的困難。因此,我們提出了“同時發生的事件片段”的概念來解決時間區間事件間複雜關係的問題。首先根據時間區間的事件間“同時發生”的部份將時間區間事件切割成互斥的更小事件片段,即“同時發生”的一段時間區間內可能有許多事件片段,而原本的事件序列可表示成我們所提出新的事件序列表示方式:以“同時發生”時間排列的有序序列,稱之為“同時發生事件片段序列表示法”。因此,我們考慮事件片段間的相互關係變地相當簡單,即前後、同時。我們提出一個演算法 CTMiner 基於“同時發生事件片段序列表示法”來表示事件序列並利用知名的循序樣式探勘演算法 PrefixSpan 的概念來找出頻繁的時間區間事件循序樣式,並能完全避免產生候選樣式。最後,為了能理解頻繁的“同時發生事件片段序列”樣式的意義,我們利用關係序列來呈現此頻繁樣式中時間區間事件間所有的關係。並且,我們還根據“同時發生的事件片段”的特性,設計了一些策略來提升CTMiner演算法的效率。在實際的圖書館借閱資料和合成資料的實驗結果皆表現出此演算法的效率和適應性。

並列摘要


Existing sequential pattern mining algorithms assume that events occur instantaneously. However, events in real world applications usually have durations which are called interval-based events. But complex relationship among event intervals causes difficulty in designing an efficient interval-based event mining algorithm. Therefore, the concept of “coincidence-slice” is proposed to solve the problem caused by the complex relationship among event intervals. The event intervals are incised to disjoint smaller “event slices” according to the coincidences among event intervals, that is, several event slices may occur in the same time period called “coincidence”. Therefore, an original event sequence can be represented as a list of ordered “coincidences” which contains event slices. This new representation proposed is called “coincidence sequence representation”. We transform the problem of complex relationship among event interval to consider the simple relationship among event slices. The proposed interval-based sequential pattern mining algorithm called CTMiner is based on the “coincidence sequence representation”. The CTMier also uses the concept of well-known sequential pattern mining algorithm PrefixSpan to find temporal patterns without candidate generation. Finally, to comprehend the frequent temporal pattern represented by “coincidence sequence representation”, we discover and use relation list to present all the relationships in a pattern. We also implement some pruning strategies to improve the performance of CTMiner by considering the characteristics of the “Coincidence-slice”. Experiments on both synthetic datasets and real dataset of library lending indicate the efficiency and scalability of the proposed algorithm.

參考文獻


[3] R. Srikant and R. Agrawal. “Mining Sequential patterns: Generalizations and Performance Improvements,” Proceedings of 5th International Conference on Extended Database Technology (EDBT’96), 1996.
[4] M. J. Zaki. “SPADE: An Efficient Adlgorithm for Mining Frequent Sequences,” Machine Learning, vol. 42, numbers 1-2, pp. 31-60, 2001.
[7] J.F Allen. “Maintaining Knowledge about Temporal Intervals,” Communications of ACM, vol.26, issue 11, pp.832-843, 1983.
[10] P. Papapetrou, G. Kollios, S. Sclaroff, and D. Gunopulos, “Discovering frequent arrangements of temporal intervals,” International Conference on Data Mining (ICDM’05), pp. 647-661, 2005.
[12] F. Morchen and A. Ultsch. “Efficient Mining of Understandable Patterns from Multivariate Interval Time Series,” Data Mining Knowledge Discovery, vol. 15, number 2, pp.181-215, 2007.

被引用紀錄


黃曉君(2011)。客籍企業家興業精神之研究〔碩士論文,國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-1903201314424363

延伸閱讀