近年來,由於開放資料(Open Data)被認定為涵有大量的潛在價值,故在資訊技術領域內已成為十分熱門的議題。而於開放資料中,政府開放資料(Open Government Data)已受西方各國與聯合國等國際組織的特別重視,並且大力推行。然而,於網際網路內釋出的開放資料,其資料格式過於繁雜,不同來源的資料常存在資料欄位定義的差異,導致資料整合與分析的不便。如何蒐集與整合其多元的開放資料並加以提供分析人員更迅速地進行資料分析與萃取重點資訊,成為當前熱門的話題。 故本研究提出一項資料彙整分析平台雛型。其功能特色主要是可以自動進行開放資料的擷取與整併,結合Hadoop之巨量資料處理工具與R語言之資料探勘工具,以進行資料的分析處理,並於分析完成後自動留存關鍵因子,以提供後續決策分析使用。 最後本研究則以農產品交易紀錄與歷史的天氣資料為例,經由本研究所開發之平台進行資料的擷取與整併,並透過平台內的決策樹之資料探勘方法進行迴圈式資料分析之行為,將每次分析模型儲存後,再依各農產品之類別來彙整其共同影響之因素,以提供決策者更完整的參考資訊。
In recent years, because of the massive potential values in “open data”, it has been become a quite popular topic in the domain of information technology. In addition, western countries and international organizations, such as United Nations endeavored to prompt the open government data. Moreover, we obtain data from various sources, which usually do not transform the content with unique format. This would cause inconvenient to integrate and analyze the data. Therefore, it is a prominent issue to develop a mechanism which is capable of collecting and integrating the heterogeneous open dataset seamlessly and support the analysts to retrieve the potential information efficiently. Thus, this study adopts Hadoop platform and R language to implement a prototype system that can automatically capture and consolidate the open data. After the processes are finished, all results, including summarized data, analytical models, decision tree rules, and discovered key factors will be stored in relational database and HDFS. We try to collect the agriculture transactional data and historical climate records through our procedures. Additionally, this system generates the common key factors from various crops belong to a specified category by adopting proposed looping decision tree mechanism.