透過您的圖書館登入
IP:3.21.100.34
  • 學位論文

一個針對動態資料驅動應用系統概念飄移的平行偵測與預測方法

A Parallel Detection and Prediction Method for Concept Drift in Dynamic Data Driven Application System

指導教授 : 羅濟群 黃興進

摘要


傳統的資料分析與預測方法,其預測模型都假設資料是穩定分佈的,所以藉由參照歷史資料、學習資料之間的關係,能夠很準確地預測(分類)尚未標記的資料的標記。然而,在今天多變性的大資料環境下,預測模型因為太過於依賴歷史的資料,而無法正確地推測出隨著情境而改變的資料關聯性的現象(概念飄移)。本研究提出一個針對動態資料驅動應用系統概念飄移的平行偵測與預測方法。所提出的方法快速偵測資料概念的改變,並即時的將概念飄移回饋給系統,進而調整預測模型來提高即時預測的準確率。同時,我們利用平行運算,透過區域性預測來計算出全域性預測,有效的提高預測準確率,、並減少了整體運算的時間。我們利用Map-Reduce的分散式平臺和分類演算法來實作。結果顯示,在兩個實驗案例中,平均預測的準確率較以往的預測方法分別提升了 14% 和 35%;在運算效能部分,較傳統計算方式分別節省了近 45% 和 29% 的時間。

並列摘要


The traditional data analysis and prediction method assumes that data distribution is stable. Therefore, it can predict unlabeled data precisely by analyzing the historical data. However, in today’s big-data environment, which is changing frequently, the traditional approach can no longer be effective; it cannot handle concept drift in a Dynamic Data Driven Application System (DDDAS). This thesis proposes a parallel detection and prediction method for concept drift in DDDAS. The proposed method can detect changing data and then feedback to the prediction model for better subsequent predictions. Furthermore, this method computes a global prediction by aggregating local predictions. Therefore, prediction accuracy is increased and computation time is decreased. In simulation, Map-Reduce is used for parallel processing. Two cases are tested. Results show that prediction accuracy is raised by 14% and 35% for these two cases, respectively. The execution time is improved by almost 45% and 29%, respectively.

參考文獻


3. Mell, P. and T. Grance, The NIST definition of cloud computing. National Institute of Standards and Technology, 2009. 53(6): p. 50.
7. Darema, F., Dynamic data driven applications systems: A new paradigm for application simulations and measurements, in Computational Science-ICCS 2004. 2004, Springer. p. 662-669.
8. Kolter, J.Z. and M.A. Maloof, Dynamic weighted majority: An ensemble method for drifting concepts. The Journal of Machine Learning Research, 2007. 8: p. 2755-2790.
9. Darema, F., Introduction to the ICCS 2007 workshop on dynamic data driven applications systems, in Computational Science–ICCS 2007. 2007, Springer. p. 955-962.
10. Douglas, C.C., et al. DDDAS approaches to wildland fire modeling and contaminant tracking. in Simulation Conference, 2006. WSC 06. Proceedings of the Winter. 2006. IEEE.

延伸閱讀