應用EMM於資料串流分群之研究

由於現今資訊科技發達，再加以Internet網路的興起，資料流探勘(data stream mining)已逐漸受到重視。隨著各種新興應用的崛起，例如網路流量分析、網頁點選串流探勘、網路入侵偵測、以及線上交易分析等，我們所要探勘的資料不再是靜態的資料，而是需要建立一個可以依即時且連續的動態資料流來調整探勘模型的機制。可擴張的馬爾可夫模型(Extensible Markov Model)是一個依資料流時間及空間特性可動態調整的馬爾可夫鏈(Markov Chain)，它可以改善傳統馬爾可夫鏈(Markov Chain)因靜態性質導致所建構出來的模型在真實環境應用時，分群能力不如預期等問題。本文以EMM撘配不同的相似度分群方法，並針對不同資料集來做分析，並找出一個可適用於不同串流資料的EMM分析方法。

關鍵字

可擴張的馬爾可夫模型；資料探勘；資料流探勘；相似度計算

並列摘要

In recent years, data stream mining has become an important research topic. With the emergence of new applications and the rapid development of the Internet, the urging data required to be processed and clustered is not again static, but the continuous dynamic data stream, such as network traffic analysis, network intrusion detection, and on-line transaction analysis. By providing a dynamically adjustable clustering scheme, Extensible Markov Model (EMM) overcomes the problems caused by the static nature of the traditional Markov Chain (MC). For instance, a problem for the traditional Markov Chain is the structure of MC modeled at the model construction time may be not suitable to be used in real-world application. Instead, EMM is particularly well suited to model data stream. In this paper, we compare and analyze several common similarity computing methods, which can be used in EMM, and propose a general EMM version with the best similarity computing method for better data stream clustering techniques.

並列關鍵字

Data Stream Mining ； Extensible Markov Model ； Similarity Computation

國際替代計量

應用EMM於資料串流分群之研究

全文下載

主題瀏覽