透過您的圖書館登入
IP:216.73.216.72
  • 學位論文

用HDFS與記憶體資料庫強化Cassandra

Enhancing Cassandra with HDFS and In-Memory Database

指導教授 : 李允中

摘要


Apache Cassandra 是個由Facebook開發出來的分散式NoSQL資料庫,Cassandra中 有兩個問題,第一,Cassandra不支援檔案的存取,第二,Cassandra 雖然有良 好的寫入速度,但由於架構上使用LSM tree 在搜尋時會需要讀取硬碟多次導 致Cassandra在讀取上的速度卻不理想。 因此我們為Cassandra接上HDFS,且在Cassandra中加上一個新的Key space 使Cassandra可以管理有多少檔案被存在HDFS中,再來在Cassandra Driver中加上 一個DAO做interface讓使用者可以輕易的使用,來解決上述第一個問題。為了解 決第二個問題,我們在Cassandra中加入in-memory table,使用者只需要在建一個 新的column family時額外增加一行即可改變底層的資料結構成T tree 。 最後在與 原有的Cassandra做比較,結果中原本的 Cassandra 在Select 80,000 資料中平均花 費85.10419秒 ,而 in-memory 的 Cassandra 在Select 80,000 資料中平均只需花費 62.89264秒。

並列摘要


Apache Cassandra is a distributed NoSQL database developed by Facebook.Cassandra has two problems.First, Cassandra doesn’t support file access. Second, Cassandra has a good write performance, but in search due to its architecture Cassandra uses LSM tree and needs to read disk multiple times cause bad performance on search. To solve the first problem above, we combine HDFS and Cassandra, and add a new key space in Cassandra.Cassandra can be used to manage how many files are stored in HDFS. Then add DAO interface to Cassandra Driver to make it eaier to user to use. To solve the second problem, we add a memory table in Cassandra, The user only needs to add an extra line when building a new column family, then the data structure inside will change to T tree. Finally, compared with the original Cassandra, it spent on select data cost 85.10418 seconds on average, while in-memory Cassandra only take 62.89264 seconds in same case.

並列關鍵字

Apache Cassandra HDFS in-memory database T tree LSM tree

參考文獻


[1] Datastax. https://www.datastax.com/.
[2] M. K. Gupta, V. Verma, and M. S. Verma. In-memory database systems-a paradigm shift. arXiv preprint arXiv:1402.1258, 2014.
[3] A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35–40, 2010.
[4] T. J. Lehman and M. J. Carey. A study of index structures for main memory database management systems. In Proc. VLDB, volume 1, 1986.
[5] Y. Li and S. Manoharan. A performance comparison of sql and nosql databases. In Communications, computers and signal processing (PACRIM), 2013 IEEE pacific rim

延伸閱讀