XML資料流之頻繁樣型探勘

本論文為XML資料串流設計一套新穎的探勘方法，分為三個階段，首先將每份XML文件轉換成序列型式，再建立樹狀結構巧妙地壓縮大量的子序列（子樹）並儲存其出現頻率，最後提供有效率的探勘演算法，可找出所有的頻繁樣型，亦可用於探勘最大樣型。實驗結果顯示，在兩種不同型態的資料上，本論文所提方法均可達成不錯的效率；其中，稀疏資料上的探勘效率較佳，而在容錯模式下，密集資料上回傳答案的準確性則相對較高。

關鍵字

資料串流；資料探勘； XML ；頻繁樣型；最大樣型

並列摘要

This paper designs a novel method for data mining on XML streams, which consists of three phases. Each XML document is first transformed into a sequence. After that, a compact tree structure is built to compress the huge amount of subsequences (subtrees) and keep their frequencies. Finally, an efficient algorithm for mining all frequent patterns is provided. Moreover, it can also be applied for mining maximal patterns. The experimental results show that the proposed method performs well in two different kinds of datasets. Its efficiency on sparse data is better, while the accuracy of its returned results on dense data is better when a few errors are tolerable.

並列關鍵字

data streams ； data mining ； XML ； sequence ； frequent patterns ； maximal patterns

國際替代計量

XML資料流之頻繁樣型探勘

全文下載

主題瀏覽