透過您的圖書館登入
IP:18.219.236.62
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


Moving Towards Pure ANSI SQL in NoSQL The main focus of this master’s thesis is to narrow down the user friendly gap between the newly more distributed data processing platforms (HBase,Cassandra, MapReduce e.t.c) and the traditional less distributed data processing platforms e.g. (RDBMS’s). Lot of work have been done in this area e.g. Hive and Pig but they are not pure SQL. Over the past few decades RDBMS’s and data-warehouses were the only choice of data processing platforms with rich set of data processing tools e.g. SQL but recently, due to the variety, velocity and volume of data, these traditional data processing platforms becomes less efficient to handle this kind of data; thereby the need to come up with more efficient data stores and processing platforms. Though NoSQL data stores have lived up to their expectations of storing and processing large datasets but this process might not be simple and convenient as in traditional databases. One common cons of NoSQL databases is the lack of the much loved SQL language. This thesis will therefore focus on this new type of data stores also called (NoSQL). Specifically we will focus on HBase which is a column oriented or BigTable like Database as our choice of NoSQL store. The fact that NoSQL databases are becoming very popular we will propose our data mapping methods which can help migration from Relational Databases to NoSQL databases to be less daunting. Since this movement is from RDB’s which has rich set of procedures i.e. SQL to access and manipulate data, we will extend our work to bridge the gap between SQL and NoSQL by providing methods of using pure ANSI SQL to manipulate the underlying data which is stored in our NoSQL store (HBase).

並列摘要


Moving Towards Pure ANSI SQL in NoSQL The main focus of this master’s thesis is to narrow down the user friendly gap between the newly more distributed data processing platforms (HBase,Cassandra, MapReduce e.t.c) and the traditional less distributed data processing platforms e.g. (RDBMS’s). Lot of work have been done in this area e.g. Hive and Pig but they are not pure SQL. Over the past few decades RDBMS’s and data-warehouses were the only choice of data processing platforms with rich set of data processing tools e.g. SQL but recently, due to the variety, velocity and volume of data, these traditional data processing platforms becomes less efficient to handle this kind of data; thereby the need to come up with more efficient data stores and processing platforms. Though NoSQL data stores have lived up to their expectations of storing and processing large datasets but this process might not be simple and convenient as in traditional databases. One common cons of NoSQL databases is the lack of the much loved SQL language. This thesis will therefore focus on this new type of data stores also called (NoSQL). Specifically we will focus on HBase which is a column oriented or BigTable like Database as our choice of NoSQL store. The fact that NoSQL databases are becoming very popular we will propose our data mapping methods which can help migration from Relational Databases to NoSQL databases to be less daunting. Since this movement is from RDB’s which has rich set of procedures i.e. SQL to access and manipulate data, we will extend our work to bridge the gap between SQL and NoSQL by providing methods of using pure ANSI SQL to manipulate the underlying data which is stored in our NoSQL store (HBase).

並列關鍵字

Hadoop MapReduce NoSQL Hive HBase ANSI SQL

參考文獻


[1] David Dewitt and Jim Gray, Parallel Database System: The future of High
Performance Database Systems, ACM 1992.
for Structured Data
[5] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing
[8] Chongxin Li. RDB to HBase: Transforming Relational Database into

延伸閱讀