透過您的圖書館登入
IP:18.119.133.228
  • 學位論文

有效率的跨交易關聯規則探勘演算法

Efficient Mining Algorithms for Inter-transaction Association Rules

指導教授 : 李瑞庭

摘要


在本論文中,我們針對探勘跨交易關聯規則,提出了二個有效率的演算法,稱為ITP-Miner (Inter-Transaction Patterns Miner)以及CITP-Miner (Closed Inter-Transaction Patterns Miner)。我們對這二個演算法設計了三種資料結構: ID-pair儲存跨交易樣式探勘所需的資訊; ITP-tree能夠列舉並找出所有的頻繁跨交易樣式;以及CITP-tree能夠列舉並找出所有的封閉式頻繁跨交易樣式。 ITP-Miner演算法使用ID-pair以及ITP-tree以探勘所有頻繁跨交易樣式。由於使用了ITP-tree,ITP-Miner只需讀取資料庫一遍並且將合併,修剪,及支持度的計算限定於少量的ID-pair。實驗結果顯示ITP-Miner演算法比FITI演算法快了數十倍。 CITP-Miner演算法使用ID-pair以及CITP-tree以探勘所有封閉式頻繁跨交易樣式。由於使用了CITP-tree,CITP-Miner能夠使用有效的修剪策略以避免耗時的候選樣式產生以及重覆的支持度計算。實驗結果顯示CITP-Miner演算法比FITI以及ClosedPROWL演算法快了數十倍。 另外,我們比較ITP-Miner以及CITP-Miner演算法,由於CITP-Miner使用有效的修剪策略找出封閉式頻繁跨交易樣式,而封閉式頻繁跨交易樣式的數量通常少於ITP-Miner所找出所有的頻繁跨交易樣式,所以實驗結果顯示在大部份的情況下,CITP-Miner演算法執行時間較ITP-Miner演算法快,然而卻會消秏較多的記憶體。

並列摘要


In this dissertation, we propose two efficient algorithms, called ITP-Miner (Inter-Transaction Patterns Miner) and CITP-Miner (Closed Inter-Transaction Patterns Miner), for mining inter-transaction association rules. We devise three data structures for both algorithms: an ID-pair stores the information used to find inter-transaction patterns; an ITP-tree enumerates and discovers frequent inter-transaction patterns; and a CITP-tree enumerates and discovers closed frequent inter-transaction patterns. The ITP-Miner algorithm uses the ID-pairs and the ITP-tree to mine all frequent inter-transaction patterns. By using the ITP-tree, the ITP-Miner requires only one database scan and can localize joining, pruning, and support counting to a small number of ID-pairs. The experiment results show that the ITP-Miner algorithm outperforms the FITI (First Intra Then Inter) algorithm by one order of magnitude. The CITP-Miner algorithm uses the ID-pairs and the CITP-tree to mine all closed frequent inter-transaction patterns. By using the CITP-tree, the CITP-Miner can embed effective pruning strategies to avoid costly candidate generation and repeated support counting. The experiment results show that the CITP-Miner algorithm outperforms the FITI and ClosedPROWL algorithms by one order of magnitude. We also compare the ITP-Miner and CITP-Miner algorithms. Since the CITP-Miner uses effective pruning strategies for mining all closed frequent inter-transaction patterns, and the number of patterns mined by the CITP-Miner may be much less than that mined by the ITP-Miner, the experiment results show that in most of the cases, the CITP-Miner algorithm outperforms the ITP-Miner algorithm in terms of execution time, but it consumes more amount of main memory.

參考文獻


[1] R.C. Agarwal, C.C. Aggarwal, V.V.V. Prasad, A tree projection algorithm for generation of frequent itemsets, Journal of Parallel and Distributed Computing 61 (3) (2001) 350-371.
[3] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 1993, pp. 207–216.
[5] C. Apte, B. Liu, E. P.D. Pednault, and P. Smyth, Business applications of data mining, Communications of the ACM 45(8) (2002) 49–53.
[6] R.J. Bayardo, Efficiently mining long patterns from databases, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 1998, pp. 85–93.
[7] B. Berendt, M. Spiliopoulou, Analysis of navigation behaviour in web sites integrating multiple information systems, The VLDB Journal 9 (2000) 56-75.

延伸閱讀