由於資訊科技快速的進步,讓資料探勘可以使用眾多不同資料進行分析,但是要得出某種探勘方法的最佳參數設定是有相當的難度,需要不斷的嘗試才能取得最佳參數,然而目前所使用的設計工具皆無法達成在同一時間內處理複數資料模型產生工作,研究人員需要耗費寶貴的時間等待單一模型的結果。 本研究提出一個新的架構使用開放原始碼的統計語言 R 做為我們處理探勘模型的基底以決策樹做為研究的評估方法,使用C# 製作使用者介面與工作伺服器以及專屬的R語言腳本執行器,並且將雲端技術的服務概念應用到此系統中,建立多節點處理架構,將產生探勘模型的過程使用分散到不同主機上運算提升整體效能,節省運算時間並主動尋找每次最佳參數組合之模型提供使用者參考。最後將與市面上部分熱門的商用探勘軟體進行簡易比較,用以確認此架構具有一定程度之可行性。
With the improvement of information technology, data mining can be used to analyze various kinds of data. The parameters of mining method always affect the results’ quality. Researchers need constantly spend lots of computational time to find the optimal parameter set. However, currently commercial mining tools are unable to deal with multi-data model at one time. Furthermore, we need to spend much time when processing a mining model with large data set. This study proposes a new architecture using open-source statistical language R as the base, choosing decision tree model as our evaluation method. Use C# to design user interface as well as a work server and the R language script program. Apply the concept of cloud service technologies to our system, and develop a multi-node processing architecture. The proposed mining process will corporate all available hosts to improve the solving performance. This system can save computational periods and try to find the best combination of parameters of each model. Finally, we will provide the system limitation test (data size, ram usage) compared with some commercial mining software, and evaluate the feasibility of this architecture.