透過您的圖書館登入
IP:13.59.82.167
  • 學位論文

資料流序列中資料項預測方法之研究

A Pattern-based Method for Item Predictions over Data Streams

指導教授 : 柯佳伶
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


不同於以往的靜態交易資料庫,愈來愈多的應用之資料輸入方式形成資料流型態,其輸入的快速及連續使得在資料串流預測上的面臨重要的挑戰,資料會隨著時間不斷快速進入,因此必須提供極有效率的處理,同時,資料分佈及隱含樣式也可能隨著時間而改變。因此,針對以上的挑戰,本篇論文提出一個稱為預測樹的樹狀結構,可以快速的從訓練資料集中探勘出重覆樣式產生資料項預測則。當概念發生變動時,需重新探勘最近視窗內的重覆樣式,產生新的預測規則以適應目前的概念。本論文提出的第一個方法稱為ERT,藉著計算滑動視窗內的預測錯誤率來判斷是否發生概念變動,若錯誤率大於給定的最小錯誤率門檻值,則重新探勘產生新的預測規則,並且將先前預測準確率高的規則與新規則整合。另一個方法則是每個固定的時間點會觸發重新探勘,根據產生新規則時是否有與先前的規則整合又可以分為WANR及WRNR兩種方法。實驗結果顯示,WRAR的預測錯誤率略高於其他兩種方法,而ERT在三種方法中是最有效率且最有效的,因為此方法僅在偵測到概念變動時重新調整規則。

並列摘要


Because of progressing of various electronic equipments, more and more data of applications is collected quickly and constantly to form a data stream. Two challenges arise when performing item predictions in a data stream. The first one is that the data is continuously inputted in high-speed, such that it is required to perform the processing efficiently. Besides, the data distribution and the implicit patterns might change over time. In this thesis, a structure named prediction-tree is proposed to discover prediction rules from repeating patterns in the training data quickly. For adapting the concept changes, it is necessary to generate new prediction rules by re-mining repeating patterns in the most recent sliding window. The first approach, named ERT, is to monitor the accuracy of predictions in a sliding window for detecting the concept changes. When the error rate in a sliding window is higher than a given threshold value, new prediction rules are generated by re-mining repeating patterns. Then the previous prediction rules with high accuracy are remained to be combined with the new generated ones. The other approach is to trigger the re-mining every other non-overlapping data window. Two variations of the window-based triggering approach, named WANR and WRNR, are provided according to whether the previous rules are remained to be combined with the new ones or not. The experimental results show that the error rate of WRNR is slightly higher than the others. However, ERT is the most efficient and effective one among the three methods because it needs to adjust rules only when the concept changes are detected.

參考文獻


[1] Y.Yang, X Wu and Xingquanzhu,” Mining in Anticipation for Concept Change:Proactive-Reactive Prediction in Data Streams,” in Proc. of the 6th
IEEE International Conference on Data Mining (ICDM), 2006
[3] I. Bouzouita, S. Elloumi and S. B. Yahia ,”GARC : A New Associative Classification Approach,” in Proc. of 16th International Conference on Database and Expert Systems Applications (DEXA),2006.
[7] X. Yin and J. Han, “CAPR :Classification based on Predictive Association Rules,” in Proc of the 3rd SIAM International Conference on Data Mining(SDM),2003
[10] J.Han,J.Pei, and Y.Yin.Mining, ”Frequent patterns without candidate generation. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD), 2000

延伸閱讀