近幾年來,資料流探勘已逐漸受到重視。隨著各種新興應用的崛起,例如網路流量分析、網頁點選串流探勘、以及線上交易分析等,我們所要處理的資料不再是靜態的資料,而是一連串即時且連續的動態資料流。本研究提出一種新的資料流探勘架構,稱之為“加權移動視窗模式”(Weighted Sliding Window model),它可以讓使用者設定視窗的個數、視窗時間的長短、以及各個視窗的權重,這種讓使用者能夠對重要資料給予較高權重的作法,將可以讓資料流探勘的結果更符合使用者的需求。我們以加權移動視窗的模式為基礎,使用有限的記憶體空間設計了一個one-pass演算法,稱之為WSW 演算法,可以從交易資料流中發掘出所有的大型項目集。利用資料的特性,我們提出一個改進的演算法,稱之為WSW-Imp,可以更進一步地減少判斷候選項目集是否為大型項目集的時間,讓交易資料流的探勘更有效率。實驗證明,WSW-Imp演算法的執行效率的確優於WSW演算法。
In recent years, data stream mining has become an important research topic. With the emergence of new applications, the data we need to process is not again static, but the continuous dynamic data stream. Examples include network traffic analysis, Web click stream mining and on-line transaction analysis. In this paper, we propose a new structure for data stream mining, called the weighted sliding window model, which can let the user specify the number of windows, the size of a window, and the weight for each window. Based on the proposed model, we design a one-pass algorithm, called WSW, using a limited memory space to efficiently discover all the large itemsets from data streams. Based on the WSW algorithm and data characteristics, we propose an improved algorithm, called WSW-Imp to further reduce the time of deciding whether a candidate itemset is a large itemset or not. Empirical results show that WSW-Imp outperforms WSW under the weighted sliding windows.