透過您的圖書館登入
IP:3.133.157.12
  • 期刊

個人化分散式大資料開發平台之研發與工程應用

DEVELOPMENT AND APPLICATION OF PERSONAL HADOOP-BASED BIG DATA PLATFORM

摘要


近年來,大資料及資料挖掘技術為熱門的研究領域,不少跨國企業,如英特爾、谷歌及阿里巴巴等,每年均投放大量資源,挖掘和分析大資料,從而調整經營策略。環境監測與運算大資料分析的需求已快速成長,但開發者卻缺少大資料程式開發測試平台,若要自行架設分散式大資料運算叢集並不容易,因此本研究利用虛擬環境建置技術軟體,研發一套個人化大資料開發平台的虛擬機器,能在單一主機如個人電腦上快速建立分散式叢集系統及其編程模型開發環境。針對系統效能檢測,本研究以標準計字數案例進行各種相關效能分析,其結果顯示,在程式開發測試階段,使用一加三虛擬機器分散式運算之規格可為初學者工程人員極佳之測試與訓練平台。最後,本研究以河川環境監測與模式運算兩個應用測試案例來說明本研究大資料開發平台系統的大資料分析技術,其中,河川流場影像辨識案例能說明大資料開發平台分散式儲存特性以及分散式管理與平行計算原理;而二維水理模式應用案例則說明利用程式串流技術,能讓土木水利界常用之Fortran程式直接轉換進行分散式管理與平行計算的方法。因此,本研究所提出之個人化大資料開發平台與兩個應用測試案例將可做為國內大資料研發應用之有力工具,協助加快解決土木水利工程應用問題。

並列摘要


Big data and data mining technology is getting much more popular in recent years. Many world-class corporations such as Intel, Google and Alibaba invest large amounts of financial and manpower resources to perform big data analysis and data mining in order to assist the decision making and business strategy. The demand for big data analytics associated with environmental monitoring and model simulation has grown rapidly. However, many developers lack a big data platform for programing and testing because a distributed Hadoop cluster is not easily built. Hence, the present study utilized virtual environment technology to establish the personal Hadoop-based big data platform, which can replicate virtual machines on a single machine and provide an environment for data management and computing programing. Regarding the performance benchmark, the standard WordCount case was employed to analyze the performance. The result shows that using the distributed 1 + 3 virtual machines could be an ideal platform of code programing and testing for beginners with civil engineering background. In the end, two application cases are given to illustrate the big data analytics techniques in the developed big data platform. One is the flow image recognition for river velocity measurement, which explains the storage characteristics in the special designed file system and the distributed computing concept in data management and computing programing. The other is the two-dimensional hydraulic model simulation, which introduces the way to use native Fortran code for data management and computing programing by the streaming technique. Thus, the proposed big data platform with virtual machine capability as well as two application cases could be powerful tools to facilitate fast solving civil and hydraulic engineering problems regarding big data issues.

延伸閱讀