資料分析結果儲存於HBase的方法設計

隨著網路時代的發達，資料數量快速大量的增加，在面對持續不斷暴增的資料量的同時，對於巨量資料的儲存、管理、搜尋、分析等需求也越來越重要。因此本論文提出一套機制，結合雲端運算與資料倉儲的概念，將所蒐集的資訊加以處理後儲存，能更加便利於取得所需的資料。資料倉儲[1][2]這個資訊項目的主要目的是為了將蒐集不同來源的資料進行整合，配合分析資料的工具，使這些資料可以被使用者存取及分析使用，而一般的資料倉儲架構可以算是一種分離式的資料庫，針對使用者需求來設計調整，並且為所有資料最核心的部分，所儲存的資料被修改的機會很少，它可以用來簡單的資料查詢或是用現有資料進行探索取得更有價值之資訊。資料倉儲以“資料集中”為概念，雲端技術強調“分散運用”，在面對大量的資料下，將兩者的合併，將雲端上龐大而凌亂的資料透過整理分析進而整合，這些資訊不但可提供使用者查詢亦可進行更進一步的分析。

關鍵字

Data Warehouse ； HDFS ； Hadoop ； HBase ； Pig

並列摘要

As the Internet developed nowadays, data on the Internet are growing fast and become extremely large data sets. For facing the amount of data fast increasing, demand for massive data’s storage, management, search and analysis are more and more important. Therefore, this paper proposes a method, combined with the concept of cloud computing and data warehousing, processed the collected information and storage, can be more convenient and correct to obtain the required information. The main purpose of the data warehouse project is to collect data from different sources and to integrate information, with using data analysis tools, those information can be accessed and analyzed using by the user. Data warehouse architecture can be regarded as a separate database, for the user needs to do some design adjustment, deemed as the core of all data. The stored of data has rarely been modified, it can be used for simple information queries, or used for more complex data exploration to get valuable information. Data Warehousing with "dataset" as the concept, and cloud computing emphasis on "distributed applications", in the face of large amounts of data, the merger of the two, large and messy data through the consolidation and analysis further integration, these information will not only providing user queries can be further analyzed.

並列關鍵字

Pig ； HBase ； Data Warehouse ； HDFS ； Hadoop

參考文獻

[5] Jeffrey Dean, Sanjay Ghemawat. Mapreduce:Simplified Data Processing on Large Clusters. Communications of the ACM - 50th anniversary issue: 1958 - 2008, pp.107-113, January 2008.

[6] Jeffrey Shafer, Scott Rixner, Alan L. Cox. The Hadoop Distributed Filesystem:Balancing Portability and Performance. Performance Analysis of Systems & Software (ISPASS),Vol.1, pp. 122-133, March 2010.

[9] Jianling Sun, Qiang Jin. Scalable rdf Store based on Hbase and Mapreduce. Advanced Computer Theory and Engineering (ICACTE), Vol.1, pp. 633-636, August 2010.

[10] Inmon, W.H. and Hackathorn, R.D. Using the Data Warehouse, pp.2, July 1994

[13] Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins. Pig Latin：A Not-So-Foreign Language for Data Processing. ACM SIGMOD International Conference on Management of data, pp. 1099-1110, 2008.

國際替代計量

資料分析結果儲存於HBase的方法設計

未授權

主題瀏覽