漸增式探勘演算法－QDT

隨著資訊科技的進步、電腦的普及，蒐集資料變得更容易、快速而且方便。但長時間之下，資料庫累積了大量且有隱藏性的資料。所以，如何將這些被隱藏的資料，做正確又有效率地探勘，成為一個重要的議題。因此，資料探勘的技術便應運而生。當中，最被廣為使用的技術為關聯規則之探勘。關聯法則探勘主要是探討如何從龐大資料庫中找出高頻項目集，進而發掘有用的知識。而在關聯規則中最常被使用的方法為Aprfori演算法。雖然此方法可以找出關聯規則，但是它有二個最大的缺點：第一點為在找高頻項目集合時，會產生大量的候選項目集合；第二點為執行時必須經常掃瞄整個資料庫，造成執行效率不佳。本研究所提出ODT演算法脫離Apriori演算法的架構，在產生大項目集合時，只需掃描資料庫一次，因此可以有效率地降低I/O的存取時間，並且快速地找出關聯法則，使得探勘更有效率。而且QDT演算法不需要任何修改就可以當作線上即時漸增式資料探勘(On-line Incremental Data Mining)的演算法。

關鍵字

資料探勘；關聯規則；高頻項目集；拆解；漸增式資料探勘

並列摘要

Due to the improvement of information technologies and popularization of computers, collecting information becomes easier, rapider and more convenient than before. As the time goes by, database cumulates huge and hiding information. Therefore, how to correctly uncover and efficiently mining from those hiding information becomes a very important issue. Hence the technology of data mining becomes one of the solutions. In the technologies of data mining, association rules mining is one of the most popular technology to be used. Association rule mining explores the approaches to extract the frequent itemsets from large database. Further, derives the knowledge behind implicitly. The Apriori algorithm is one of the most frequently used algorithms. Although the Apriori algorithm can successful derive the association rules from database, the Apriori algorithm has two major defects: First, the Apriori algorithm produces large amounts of candidate itemsets during extracting the frequent itemsets from large database. Second, Apriori algorithm frequently scans whole database lead to inefficient performance. Many researches try to improve the performance of the Apriori algorithm, but still not escape from the frame of the Apriori algorithm and lead to a little improvement of the performance. In this paper we propose QDT (Quick Decomposition Tree) which escape the frame of Apriori algorithm, and it scans whole database once during extracting the frequent itemsets from large database. Therefore, the QDT algorithm can efficiently reduce the I/O time, and rapidly extract during extracting the frequent itemsets from large database, and make data mining more efficient than before. Besides, QDT algorithm can be applied to on-line incremental mining applications without any modification.

並列關鍵字

Data Mining ； Association Rule ； Frequent Itemset ； Decomposition ； Incremental Data Mining

參考文獻

Aarawal, R.,Srikant, R.(1995).Mining Sequential Patterns.Proc. of the Int`l Conference on Data Engineering(ICDE).(Proc. of the Int`l Conference on Data Engineering(ICDE)).

Google Scholar

Agrawal, R.,Imielinski, T.,Swami, A.(1993).Mining Association Rules Between Sets of Items in Large Databases.In proc. of the ACM SIGMOD Conference on Management of Data.(In proc. of the ACM SIGMOD Conference on Management of Data).

Google Scholar

Agrawal, R.,Srikant, R.(1994).Fast Algorithms for Mining Association Rules.Proc. of the 20th VLDB Conference Santiago.(Proc. of the 20th VLDB Conference Santiago).:

Google Scholar

Chen, M.S.,Han, J.,Yu, R.S.(1996).Data Mining: An Overview from a Database Perspective.IEEE Proceeding of the 16th ICDCS.8(6),866-883.

Google Scholar

Jong Soo Park,Ming-Syan chen,Philips S. Yu,IEEE (Trans).(On Knowledge and Enginerring).

Google Scholar

被引用紀錄

Chang, J. Y. (2005). 以傳播技巧為基礎來降低樣本集大小和時間成本的測試策略 [master's thesis, Tamkang University]. Airiti Library. https://doi.org/10.6846/TKU.2005.00340

洪菁憶（2008）。循序探勘在軟體版本控制上的應用〔碩士論文，國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-0207200917353837

國際替代計量

漸增式探勘演算法－QDT

全文下載

主題瀏覽