透過您的圖書館登入
IP:18.119.11.28
  • 學位論文

無候選型樣產生之頻繁樹狀結構探勘

MINT: Mining Frequent Rooted Induced Unordered Tree without Candidate Generation

指導教授 : 張嘉惠
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在資料探勘(Data Mining)的領域中樹狀結構的探勘(Tree Mining)是一個重要的問題,它可以應用在網站記錄(Web Logs)的分析、生物資訊(Bioinformatics)和半結構式的文件(Semi-structured Documents)上。然而在此方面的先前研究都是先產生候選型樣,再測試其是否為頻繁出現的型樣,如果不是則會被刪除。以這樣的做法會用都掉很多的時間及空間在候選者的產生與測試上。所以,在此篇論文裡面,我們使用區域頻繁的這個概念設計了一個不會有候選者產生的演算法來做「有樹根的」、「誘導的」、「無序的」樹狀結構的探勘工作,而我們把這個演算法稱為MINT。我們利用資料產生器產生一些人工合成的資料集,以及實際的網站記錄資料,和HybridTreeMiner 來做比較。實驗結果顯示出即使在樹狀結構這種複雜的資料型態中,使用找尋區域頻繁的觀念是依然可以有不錯的效能。

關鍵字

子樹 標準型式 支持度 頻繁 型樣

並列摘要


Tree pattern mining is an important issue in data mining area and it has many emerging applications including web log analysis, bioinformatics, semi-structured documents, and so on. However, most of the previous works are candidate-generation-and-testing approach. They enumerate candidate patterns from shorter patterns based on the apriori frequent patterns. Because this approach costs a lot of time and space in candidate generation and testing, in this paper, we adopt the idea of pattern growth to mine frequent rooted induced unordered tree without candidate generation. In the performance study, we use synthetic datasets and real world application datasets to compare with HybridTreeMiner. The experiments show that our algorithm is an efficient algorithm and cost-effective.

並列關鍵字

canonical form subtree pattern frequent support

參考文獻


[4] Y. Chi, Y. Yang, and R. R. Muntz, Indexing and Mining Free Trees. In proceedings of the 3rd IEEE International Conference on Data Mining (ICDM’03), November 2003.
[5] Y. Chi, Y. Yang, and R. R. Muntz, HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms. In proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM’04), June 2004.
[6] Y. Chi, Y. Yang, and R. R. Muntz, Canonical Forms for Labeled Trees and Their Applications in Frequent Subtree Mining. Journal of Knowledge and Information Systems (KAIS), August 2005, 203-234.
[7] Y. Chi, Y. Yang, Y. Xia, and R. R. Muntz: CMTreeMiner, Mining Closed and Maximal Frequent Subtrees from Databases of Labeled Rooted Trees. IEEE Transactions on Knowledge and Data Engineering, 17(2), February, 2005.
[8] J. Han, J. Pei, Y. Yin, and R. Mao, Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Journal of Data Mining and Knowledge Discovery, 8(1), 53-87, 2004.

延伸閱讀