高效率探勘關聯規則之演算法—GRA

資料探勘的技術變得日益重要，目前已廣泛應用在商業上的預測及決策的支援。其中，關聯法則在資料探勘領域中，扮演著相當重要的角色，因此，許多探勘關聯法則演算法不斷被提出及改進，主要目的在於增加執行效能或提升記憶體使用率。本研究也朝著這個目標，試著改進關聯規則演算法為主要方向。本研究主要是針對探勘關聯規則GDA演算法的特性及缺點來加以改進，雖然GDA 演算法已是很有效率的演算法之一，不過，它還是有兩個最主要的問題。第一，GDA無法探勘交易長度較長的交易資料庫。第二，GDA演算法在某階段須儲存大量非必要的項目集時，將導致記憶體使用率會呈現不佳狀態；因此，GDA演算法的實用性大打折扣。基本於上述理由，本研究提出一個改良於GDA演算法的階段拆解核心的新演算法GRA（Gradation Reduction Approaches）。GRA演算法主要的特色就是加入階段縮減交易長度機制，因為該縮減機制可大量減少非頻繁項目集的數量，將可更適用於探勘交易長度較長的資料庫。在執行過程中，GRA演算法不需要產生任何候選項目集，即可快速找出關聯規則。除此之外，GRA演算法透過階段縮減機制僅會產生可能成為頻繁的項目集，所以，不需耗費龐大記憶體空間來儲存項目集，可有效提昇記憶體的利用率。在現實生活中的資料庫容量通常都是大於記憶體容量，為了解決此問題，本研究也以GRA演算法為主題提出另一修正版GAR-M演算法，GAR-M演算法將選擇採用資料庫分割方式繼續執行探勘任務，每個子資料庫僅需進行四次實體I/O動作，不隨著頻繁項目集的長度增長而增加實體I/O 的次數，以避免耗費過多的I/O時間，也可有效提高執行效率與實用性。

關鍵字

資料探勘；關聯法則；階段縮減機制

並列摘要

The technology of data mining becomes important in recent years, and it is generally applied to commercial forecast and decision supports. Association rules mining algorithms play the important role in the field of data mining. Many of association rules mining algorithms were proposed to improve the efficiency of data mining or save the utility rate of memory. Our major study also tries to improve the efficiency of association rules mining algorithms.In this paper, our major study is to improve the defects of the GDA algorithm. Although GDA algorithm was one of the most efficient algorithms, but it still has two serious problems; in the first place, GDA algorithm can't mine the transactions of databases whose record length is very long; in the second place, GDA algorithm isn't very efficient at utility rate of the memory when it must store lots of unnecessary itemsets at one phase. Therefore, the GDA algorithm is not very practical.Based on above the reasons we propose a new algorithm-GRA (Gradation Reduction Approaches) that is improved from GDA algorithm. One of the characters of the GRA algorithm is the gradation reduction mechanisms because it can reduce lots of infrequent itemsets; the GRA algorithm is very suitable to mine the transactions of databases whose record length is very long. In the mining procedure, the GRA algorithm doesn't generate any candidate itemset to find association rules quickly. Besides, the GRA algorithm through gradation reduction mechanisms only generate those itemsets which are the most possible to be the frequent itemsets. So, the GRA algorithm can't waste lots of memory spaces to store infrequent itemsets; it can efficiently increase the utility rate of memory. The size of the databases in the real world is always greater than the size of the memory. In order to solve this problem, we propose a modifying algorithm-GRA-M (Gradation Reduction Approaches-Modifying); it divides a large database into many sub-databases and mines association rules from those sub-databases. The GRA-M algorithm only scans database four times and will not be affect by the length of frequent itemsets. The GRA-M algorithm avoids wasting a lot of I/O time and increases the efficiency and the practicability in application.

並列關鍵字

data mining ； association rules ； the gradation reduction mechanisms

參考文獻

Agrawal, R.,R. Srikant(1994).Fast Algorithm for Mining Association Rules in Large Databases.Proc. 1994 Int'l Conf VLDB.(Proc. 1994 Int'l Conf VLDB).:

Google Scholar

Brin. S.,R. Motwani,J. D. Ullman,S. Tsur(1997).Dynamic Itemset Counting and Implication Rules for Market Basket Data.ACM SIGMOD Conference on Management of Data.(ACM SIGMOD Conference on Management of Data).

Google Scholar

Han, J.,J. Pei,Y. Yin(2000).Mining Frequent Patterns without Candidate Generation.Proc. ACM SIGMOD Int. Conf. on Management of Data.(Proc. ACM SIGMOD Int. Conf. on Management of Data).

Google Scholar

Quest Synthetic Data Generation Code

Google Scholar

Lee, C. H.,C. R. Lin,M. S. Chen(2001).Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining.Proc. of the ACM 10th Intern'l Conf. on Information and Knowledge Management (CIKM-01).(Proc. of the ACM 10th Intern'l Conf. on Information and Knowledge Management (CIKM-01)).

Google Scholar

被引用紀錄

謝宛臻（2015）。運用資料探勘技術於進口精品傢俱之顧客分析〔碩士論文，國立虎尾科技大學〕。華藝線上圖書館。https://doi.org/10.6827/NFU.2015.00078

國際替代計量

高效率探勘關聯規則之演算法—GRA

全文下載

主題瀏覽