隨著網際網路的普及和雲端技術的成熟,資料量持續遽增,現在常見的巨量資料例如社交網路上的打卡、相關的Log記錄、感測器的資訊、及網際網路上的檔案、影像及圖片、網路的搜尋引擎、天文的相關資料、大氣的科學分析、人類基因資料、生化醫療分析、醫學相關記錄。生活上大大小小的東西,幾乎都會產生資料,於是造就了專門來處理巨量資料的雲端平台。在雲端平台可以很方便快速地依照需求改變並且透過網路功能存取廣泛的共享運算資源(如網路資源、伺服器設備、儲存空間、應用程式、雲端服務等)。 在雲端平台中的Hadoop架構,也因為開源軟體且功能彈性擴充的關係,更慢慢地被大家所熟知。在本論文中,我們基於Hadoop 平台概念,整合了HBase、Pig等相關元件,了解各個元件的概念與應用,進而建構出資料的分析架構,提供企業或有巨量資料需要分析之相關單位,藉以此平台達到快速資料處理與巨量資料分析之目的。
With the popularity of Internet and mobile technology matures, the dramatic increase in the amount of data, the current common information such as file archives massive social networks like punch, Log records, sensor information on the Internet, audio and video images, net Road search index, astronomical data, atmospheric analysis, gene data, biochemical analysis, medical records science. Things in life, almost always produce information so there devoted a huge amount of information on the cloud platform. Cloud platforms can easily and quickly in accordance with the changes in demand and access to a wide range of shared computing resources (such as network resources, server equipment, storage, applications, cloud services, etc.) through the Internet functions. Hadoop cloud platform as open source software platform and elastic expansion of relations, but also slowly been known to everyone. Based on the Hadoop platform, we integrate Hbase, Pig, and other similar tools, understand their purpose goal and usage of each tool, and construct a log data analysis framework, providing enterprises or organizations with a platform that achievehigh-speed process and analysis of mega data information.