一個能發掘更具意義循序樣式的探勘流程

循序樣式探勘主要是從序列資料庫中，找出與時間相關的行為樣式。過去針對循序樣式探勘所提出的方法中，多半沒有考慮到樣式的可信程度(confidence)。除此之外，探勘循序樣式雖然能夠得到事件發生的先後順序，但對於事件間的時間資訊卻非常有限。本篇論文提出一個新的演算法E-PrefixSpan，目的是從序列資料庫中探勘頻繁且更具可信度的關聯規則。我們以PrefixSpan演算法[20]為基礎，利用樣式成長(pattern-growth)[21]的探勘方法，來發掘時間相關的循序樣式。E-PrefixSpan演算法會記錄項目間的時間間隔，並建立映射資料庫來降低資料庫的掃描次數，在產生樣式過程中會依據樣式的可信度，來減少探勘中會產生龐大的樣式數量，同時確保不會造成重要樣式資訊的遺漏。我們與現存的循序樣式演算法比較，並說明我們演算法在其他方法上更能補足的地方。效能評估實驗顯示E-PrefixSpan能有效縮減所產生的關聯樣式，更能提供探勘結果額外的時間間隔資訊。

關鍵字

資料探勘；循序樣式；樣式成長；信賴度

並列摘要

Sequential pattern mining technique is developed to determine time-related behavior in sequence databases. Most of the previous proposed methods discover frequent subsequences as patterns but do not consider the confidence issue. Besides, although the discovered sequential patterns can reveal the order of events, but the time between events is not well determined. This dissertation presents, E-PrefixSpan, a new method for mining frequent and more confident association rules from sequential databases. The method is based on the PrefixSpan[20] algorithm. To take the advantage of the pattern-growth[21] mining approach and discover the time related sequential patterns, E-PrefixSpan records the time-intervals between items and creates projected databases to reduce the times of database scanning. Sequential pattern mining often generates a huge number of rules. To reduce the number of the correlated pattern without information loss, E-PrefixSpan applys the confidence pattern mining technique . The proposed approach is compared to existing sequential pattern mining methods to show how they complement each other to discover association rules. Our performance study shows that E-PrefixSpan is a valuable approach to condense the correlated patterns and provide additional time-interval information for sequential pattern.

並列關鍵字

data mining ； sequential pattern ； pattern- growth ； confidence

參考文獻

[3] 林昭妏、蔡介元，發展一個序列樣式變化之偵測模型-考慮間隔時間因素，元智大學工業工程與管理學系碩士論文，2006年。

[5] 許俊傑、周清江，MIHSPM：一個多項目集的混合循序樣式探勘演算法，淡江大學資訊管理所碩士論文，2007年。

[8] Agrawal R. and Srikant R., “Mining Sequential Patterns: Generalizations and Performance Improvements”, Proceedings of the 5th International Conference on Extending Database Technology, 1996, 3-17.

[9] Brin, S., Motwani, R., Ullman, J. D., & Tsur, S., “Dynamic Itemset Counting and Implication Rules for Market Basket Data”, ACM SIGMOD Conference on Management of Data, 1997, 255-264.

[12] Chen Y. L., Chiang M.C., Ko M.T., “Discovering time-interval sequential patterns in sequence databases”, Proceedings of the Expert Systems with Applicatons 25, 2003, 343-354.

國際替代計量

一個能發掘更具意義循序樣式的探勘流程

全文下載

主題瀏覽