An Efficient Sliding Window Based Algorithm for Adaptive Frequent Itemset Mining over Data Streams

Mining frequent itemsets over high speed, continuous and infinite data streams is a challenging problem due to changing nature of data and limited memory and processing capacities of computing systems. Sliding window is an interesting model to solve this problem since it does not need the entire history of received transactions and can handle concept change by considering only a limited range of recent transactions. However, previous sliding window algorithms require a large amount of memory and processing time. This paper, introduces a new algorithm based on a prefix tree data structure to find and update frequent itemsets of the window. In order to enhance the performance, instead of a single transaction, a batch of transactions is used as the unit of insertion and deletion within the window. Moreover, by using an effective traversal strategy for the prefix tree and suitable representation for each batch of transactions, both updating of current itemsets and inserting of newly emerged itemsets are performed together, thus improving the performance even further. Additionally, in the proposed algorithm by storing required information in each node of the prefix tree, deleting old batch of transactions from the window as well as pruning infrequent itemsets are efficiently accomplished. Although, with respect to previous algorithms, our algorithm maintains more information in the prefix tree, it does not require storing the set of transactions of the window, thus reducing the memory usage. Empirical evaluations on both real and synthetic datasets show the superiority of the proposed algorithm in terms of runtime and memory requirement. Moreover, it produces mining results with higher quality.

並列關鍵字

frequent itemset mining ； sliding window ； data stream ； data mining ； prefix tree

國際替代計量

全文下載

主題瀏覽