透過您的圖書館登入
IP:3.12.151.153
  • 期刊

XML資料流之頻繁樣型探勘

Mining Frequent Patterns from XML Streams

摘要


本論文為XML資料串流設計一套新穎的探勘方法,分為三個階段,首先將每份XML文件轉換成序列型式,再建立樹狀結構巧妙地壓縮大量的子序列(子樹)並儲存其出現頻率,最後提供有效率的探勘演算法,可找出所有的頻繁樣型,亦可用於探勘最大樣型。實驗結果顯示,在兩種不同型態的資料上,本論文所提方法均可達成不錯的效率;其中,稀疏資料上的探勘效率較佳,而在容錯模式下,密集資料上回傳答案的準確性則相對較高。

並列摘要


This paper designs a novel method for data mining on XML streams, which consists of three phases. Each XML document is first transformed into a sequence. After that, a compact tree structure is built to compress the huge amount of subsequences (subtrees) and keep their frequencies. Finally, an efficient algorithm for mining all frequent patterns is provided. Moreover, it can also be applied for mining maximal patterns. The experimental results show that the proposed method performs well in two different kinds of datasets. Its efficiency on sparse data is better, while the accuracy of its returned results on dense data is better when a few errors are tolerable.

延伸閱讀