透過您的圖書館登入
IP:3.21.244.14
  • 學位論文

資料流常見樣式改變偵測方法之研究

Frequent Patterns Change Detection over Data Streams

指導教授 : 柯佳伶
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


近來許多實際應用,其資料都是以資料流的形式產生,探勘資料流中最近常見樣式是其中一個重要的研究問題,而採用滑動視窗可有效探勘出資料流中最近常見樣式,然而在每個時間點對滑動視窗中的資料重新進行探勘將耗費許多計算成本。我們認為在常見樣式沒有顯著變化的情況下,先前探勘出的常見樣式可以近似代表目前的常見樣式,因此本論文針對特例資料流和一般化資料流環境提出監視常見樣式的方法FCDS和FCDG,藉由監視滑動視窗內的樣式出現次數,在每個時間點估算出最新的最近常見樣式集合和先前所探勘出來的常見樣式集合是否有顯著的改變差異,以提供是否重新進行探勘的判斷。由實做FCDS和FCDG之實驗結果顯示,監視常見樣式之出現次數及狀態變化情況,可有效偵測出最近常見樣式改變的狀況,使能在發現常見資料樣式已大幅改變時,重新探勘得到最新的最近樣式。此外,FCDS和FCDG之執行時間相較於每個時間點進行探勘為少,節省許多計算時間的成本,並可從維護結構中提供最近常見樣式變化狀況的資訊。

並列摘要


Recently, the data of many real applications is generated in the form of data streams. Therefore, mining recent frequent patterns over data streams is one of the important research issues in data mining. According to the literatures, the recently frequent patterns are discovered effectively by using a sliding window. However, it is costly to perform frequent patterns re-mining in a sliding window at each time point. In this thesis, it is adopted that the previously discovered frequent patterns represent the current frequent patterns approximately when the change of frequent patterns is not significant. Accordingly, two methods, named FCDS and FCDG, are proposed to monitor the frequent patterns over a special data stream and a general data stream environment, respectively. By monitoring the appearing frequencies of patterns in a sliding window, the difference between the new recently frequent patterns and the previous ones is estimated at each time point. Only when the change is more than a user-defined threshold value, a re-mining process is required to find the exact frequent patterns in the sliding window. The experimental results show that FCDS and FCDG detect the change of the recent frequent patterns effectively and efficiently. The execution time of FCDS and FCDG is less than the time of re-mining frequent patterns in a sliding window significantly. Furthermore, from the maintained data structures, the changing information of recent frequent patterns is provided.

參考文獻


[1] Y. Chi, H. Wang , P. S. Yu ,and R. R. Muntz,” Moment: Maintaining closed frequent itemsets over a stream sliding window,” In Proceedings of International Conference on Data Mining and Knowleadge Discovery, 2004.
[8] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” In Proceedings of the ACM SIGMOD International Conference on
[10] D.Lee and W. S. Lee,“Finding Maximal Frequent Itemsets over Online Data Streams Adaptively,” In Proceedings of the 5th IEEE International Conference on Data Mining,2005
[2] N.Jiang and L.Gruenwald,”CFI-Stream:mining closed frequent itemsets in data streams,”In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2006
[3] S.Ben-David , J. Gehrke, and D. Kifer,”Detecting Change in Data Streams, ” In Proceedings of the 30th International Conference on Very Large Database, 2004.

延伸閱讀