使用加權移動視窗模式之資料流探勘

近幾年來，資料流探勘已逐漸受到重視。隨著各種新興應用的崛起，例如網路流量分析、網頁點選串流探勘、以及線上交易分析等，我們所要處理的資料不再是靜態的資料，而是一連串即時且連續的動態資料流。本研究提出一種新的資料流探勘架構，稱之為“加權移動視窗模式”(Weighted Sliding Window model)，它可以讓使用者設定視窗的個數、視窗時間的長短、以及各個視窗的權重，這種讓使用者能夠對重要資料給予較高權重的作法，將可以讓資料流探勘的結果更符合使用者的需求。我們以加權移動視窗的模式為基礎，使用有限的記憶體空間設計了一個one-pass演算法，稱之為WSW 演算法，可以從交易資料流中發掘出所有的大型項目集。利用資料的特性，我們提出一個改進的演算法，稱之為WSW-Imp，可以更進一步地減少判斷候選項目集是否為大型項目集的時間，讓交易資料流的探勘更有效率。實驗證明，WSW-Imp演算法的執行效率的確優於WSW演算法。

關鍵字

資料流探勘；移動視窗；關連法則；大型項目集

並列摘要

In recent years, data stream mining has become an important research topic. With the emergence of new applications, the data we need to process is not again static, but the continuous dynamic data stream. Examples include network traffic analysis, Web click stream mining and on-line transaction analysis. In this paper, we propose a new structure for data stream mining, called the weighted sliding window model, which can let the user specify the number of windows, the size of a window, and the weight for each window. Based on the proposed model, we design a one-pass algorithm, called WSW, using a limited memory space to efficiently discover all the large itemsets from data streams. Based on the WSW algorithm and data characteristics, we propose an improved algorithm, called WSW-Imp to further reduce the time of deciding whether a candidate itemset is a large itemset or not. Empirical results show that WSW-Imp outperforms WSW under the weighted sliding windows.

並列關鍵字

Data stream mining ； Sliding window model ； Association rule ； Large itemset

參考文獻

Agrawal, R.,Ghosh, S.,Imielinski, T.,Iyer, B.,Swami, A.(1992).An interval classifier for database mining applications.Proceedings of the VLDB Conference.(Proceedings of the VLDB Conference).

Google Scholar

Agrawal, R.,Srikant, R.(1994).Fast algorithms for mining association rules.Proceedings of the VLDB Conference.(Proceedings of the VLDB Conference).

Google Scholar

Agrawal, R.,Srikant, R.(1995).Mining sequential patterns.Proceedings of IEEE International Conference on Data Engineering.(Proceedings of IEEE International Conference on Data Engineering).

Google Scholar

Aggarwal, C.(2007).(Data Streams: Models and algorithms).

Google Scholar

Chang, J.H.,Lee, W.S.(2003).Finding recent frequent itemsets adaptively over online data streams.(Proceedings of ACM SIGKDD).

Google Scholar

國際替代計量

使用加權移動視窗模式之資料流探勘

全文下載

主題瀏覽