近似探勘資料流最近常見資料項集之研究

近來有很多實際的應用，其資料是以資料流的形式產生。本論文針對資料流探勘最近常見資料項集的問題，提出兩個近似探勘的方法，分別稱為ATS演算法(Average TimeStamp mining method)及FCP演算法(Frequency Changing Point mining method)。這兩個演算法對資料流裡每筆交易皆只需掃描處理一次，記錄下可能為最近常見資料項集的出現摘要資訊。由摘要資訊結構中的資訊，不需儲存目前視窗中的交易資料，即可更新並刪除不屬於目前視窗的過時資訊，並從中快速地近似找出最近常見資料項集。其中FCP演算法記錄資料項集在出現頻率改變點的累計次數，可用以估算出該資料項集在目前視窗中的支持度，近似探勘出目前交易視窗中的最近常見資料項集。由實驗結果顯示，採用FCP演算法來近似探勘資料流中的最近常見資料項集，可達到極高的正確率(precision)及回覆率(recall)。且相較於以往所提出的SW演算法，可有效率減少資料維護及探勘執行時間，且明顯減少所需記憶體大小。

關鍵字

資料探勘；資料流；資料項集；項目集

並列摘要

Recently, the data of many real applications are generated in the form of data streams. In this thesis, two approximately mining methods, named ATS (Average TimeStamp mining method) and FCP (Frequency Changing Point mining method), are proposed for finding recent frequent itemsets on data streams. In these two approaches, each transaction in the data stream is examined at most once and the appearing summarization of possible recent frequent itemsets is recorded. According to the data structure of appearing summarization, the outdated data which does not belong to the current window can be deleted without needing to store the transactions in the current window. Then recent frequent itemsets can be mined approximately and efficiently. Among the two approaches, FCP algorithm stores the accumulative counts for itemsets at their frequency changing points. Such information can be used to estimate the supports of itemsets in current window and then the recent frequent itemsets are mined approximately. The experimental results show that, FCP algorithm can achieve high precision and recall when used for mining recent frequent itemsets in data streams. Moreover, by comparing with the existing SW algorithm, it shows FCP algorithm improves the efficiency of data maintenance and mining process significantly and reduces the memory usage further.

並列關鍵字

無資料

參考文獻

[6] J. H, Chang and W.S. Lee, “Finding Recent Frequent Itemsets Adaptively over Online Data Streams, ” in Proc. of the 9th ACM International Conference on Knowledge Discovery and Data Ming, 2003.

[7] J. H. Chang and W. S. Lee, “A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams, ” in Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.

[2] J. K. Koh and Shui-Feng Shieh, “An Efficient Approach for Maintaining Association Rules based on Adjusting FP-tree Structure, ” Lecture Notes in Computer Science: DASFAA’04 Database Systems for Advanced Applications, Springer-Verlag, 2004.

[3] Yun Chi, Haixun Wang, Philip S. Yu, and Richard R. Muntz, “Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding window, ” in Proc. of the 4th IEEE International Conference on Data Mining, pages pp. 59-66, 2004.

[9] C.-Y. Chang, M.-S. Chen, and C.-H. Lee, “Mining General Temporal Association Rules for Items with Different Exhibition Periods, ” in Proc. of the 2nd IEEE International Conference on Data Mining, 2002.

國際替代計量

近似探勘資料流最近常見資料項集之研究

主題瀏覽