透過您的圖書館登入
IP:3.17.28.48
  • 學位論文

以雲端技術發展多節點資料探勘系統-以決策樹為例

A Multi-node Data Mining System with Cloud Technology -Using Decision Tree

指導教授 : 胡念祖
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


由於資訊科技快速的進步,讓資料探勘可以使用眾多不同資料進行分析,但是要得出某種探勘方法的最佳參數設定是有相當的難度,需要不斷的嘗試才能取得最佳參數,然而目前所使用的設計工具皆無法達成在同一時間內處理複數資料模型產生工作,研究人員需要耗費寶貴的時間等待單一模型的結果。 本研究提出一個新的架構使用開放原始碼的統計語言 R 做為我們處理探勘模型的基底以決策樹做為研究的評估方法,使用C# 製作使用者介面與工作伺服器以及專屬的R語言腳本執行器,並且將雲端技術的服務概念應用到此系統中,建立多節點處理架構,將產生探勘模型的過程使用分散到不同主機上運算提升整體效能,節省運算時間並主動尋找每次最佳參數組合之模型提供使用者參考。最後將與市面上部分熱門的商用探勘軟體進行簡易比較,用以確認此架構具有一定程度之可行性。

並列摘要


With the improvement of information technology, data mining can be used to analyze various kinds of data. The parameters of mining method always affect the results’ quality. Researchers need constantly spend lots of computational time to find the optimal parameter set. However, currently commercial mining tools are unable to deal with multi-data model at one time. Furthermore, we need to spend much time when processing a mining model with large data set. This study proposes a new architecture using open-source statistical language R as the base, choosing decision tree model as our evaluation method. Use C# to design user interface as well as a work server and the R language script program. Apply the concept of cloud service technologies to our system, and develop a multi-node processing architecture. The proposed mining process will corporate all available hosts to improve the solving performance. This system can save computational periods and try to find the best combination of parameters of each model. Finally, we will provide the system limitation test (data size, ram usage) compared with some commercial mining software, and evaluate the feasibility of this architecture.

參考文獻


[12]馬崇獻,2013, 48小時高糖環境下之臍靜脈內皮細胞基因表現變化,元智大學資
[11]巫天虹,2013,以兩階段分類法建構信用卡授信決策模型的實務評估,淡江大學統
[9]簡孝竑,2013,基於雲端運算架構建立資料採礦平台,輔仁大學統計資訊學系應用統計碩士班碩士論文。
[23] Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.I. , 1984, Classification and regression trees. Belmont, Calif.: Wadsworth.
[13] The R Project for Statistical Computing,網址: www.r-project.org/‎

延伸閱讀