  • 學位論文


Mining Multi-Dimensional Sequential Patterns Using SQL Queries

指導教授 : 林志麟


以往挖掘序列型樣通常只考量在單一維度屬性的情況下,強調序列事件在資料庫出現的次序關係與支持程度,可是往往在許多真實的情境中,資料之間的關聯性是更加地複雜,一個序列事件可能會有許多額外的屬性與它產生關聯,比方說,顧客歷年來交易的產品項目(即序列事件),與它關聯的屬性則可能包含有性別、職業、居住地…等顧客特性。因此,若將這些多維度屬性一併納入序列型樣的探勘過程中,可使挖掘出來的資訊更具意義與多面性,不僅探勘結果更為詳細容易運用,且當使用者渴望從資料中萃取出特定資訊時,亦能直接提供更佳的實質利益。 在本篇論文中,我們嘗試利用多維度屬性之間具有的序列階層包含關係,提出可改善先前Dim-Seq演算法效能不彰的問題。此外,在執行兩個不同探勘標的物(多維屬性與序列)的程序時,探討採用多重門檻值的必要性,並且同樣能運用先前所提的改善方案,使探勘結果得以在較短的時間內萃取出更多的型樣。最後,為使探勘程式能與關聯式資料庫系統完整結合,我們直接使用SQL查詢的方式來執行探勘工作,以減少資料處理程序上所需要的轉置成本,為該領域提供另一個解決問題的替代途徑。


In the past, mining sequential patterns only considers events with one-dimensional attribute involved being time. On the other hand, the past formulation only emphasizes sequential associations and supports of events occur in the database on one-dimensional-attribute consideration. However, often the situation is more complex and there are probably more extra attributes relating with the event. For example, customer purchase sequences will be associated with gender, occupation, region and other attributes that relate to the customer. Hence, if sequential pattern mining can be associated with multi-dimensional information, it will be more effective and multiform. In addition, the result extracted from the data will not only be more detailed, but also be advantageous when users desire to discover specific information. In this thesis, we use hierarchy characteristics on multi-dimensional attributes to propose the strategy that improves performance of the Dim-Seq. Furthermore, we introduce multi-threshold into multi-dimensional-attribute and sequential pattern mining that combined with the strategy we proposed will enhance the results to present more patterns in the shorter time. We also use SQL implementations to perform the mining in order to tightly couple data mining application with database systems and decrease the switch cost of data processing.


[1].Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal, “Multi-Dimensional Sequential Pattern Mining,” CIKM 2001, pp: 81-88.
[2].H. Mannila, “Data mining:machine learning, statistics, and databases,” Eighth International Conferences on Scientific and Statistical Database Management, Stockholm, June 1996.
[5].Tzung-Pei Hong, Chan-Sheng Kuo and Sheng-Chai Chi, “Fuzzy Data Mining Algorithm for Quantitative Values,” 1999 3th International Conference on Knowledge-Based Intelligent Information Engineering Systems, Sept. 1999, pp: 480-483.
[6].Susana Nascimento, Boris Mirkin, Fernando Moura-Pires, “Multiple prototype model for fuzzy clustering,” IDA 1999, pp: 269-280.
[8].Rakesh Agrawal, Ramakrishnan Srikant, “Mining Sequential Patterns,” ICDE 1995, pp: 3-14.


