用HDFS與記憶體資料庫強化Cassandra

Apache Cassandra 是個由Facebook開發出來的分散式NoSQL資料庫，Cassandra中有兩個問題，第一，Cassandra不支援檔案的存取，第二，Cassandra 雖然有良好的寫入速度,但由於架構上使用LSM tree 在搜尋時會需要讀取硬碟多次導致Cassandra在讀取上的速度卻不理想。因此我們為Cassandra接上HDFS，且在Cassandra中加上一個新的Key space 使Cassandra可以管理有多少檔案被存在HDFS中，再來在Cassandra Driver中加上一個DAO做interface讓使用者可以輕易的使用，來解決上述第一個問題。為了解決第二個問題，我們在Cassandra中加入in-memory table，使用者只需要在建一個新的column family時額外增加一行即可改變底層的資料結構成T tree 。最後在與原有的Cassandra做比較，結果中原本的 Cassandra 在Select 80,000 資料中平均花費85.10419秒，而 in-memory 的 Cassandra 在Select 80,000 資料中平均只需花費 62.89264秒。

關鍵字

Apache Cassandra ； Hadhoop分散式檔案系統；記憶體資料庫； T tree ； LSM tree

並列摘要

Apache Cassandra is a distributed NoSQL database developed by Facebook.Cassandra has two problems.First, Cassandra doesn’t support file access. Second, Cassandra has a good write performance, but in search due to its architecture Cassandra uses LSM tree and needs to read disk multiple times cause bad performance on search. To solve the first problem above, we combine HDFS and Cassandra, and add a new key space in Cassandra.Cassandra can be used to manage how many files are stored in HDFS. Then add DAO interface to Cassandra Driver to make it eaier to user to use. To solve the second problem, we add a memory table in Cassandra, The user only needs to add an extra line when building a new column family, then the data structure inside will change to T tree. Finally, compared with the original Cassandra, it spent on select data cost 85.10418 seconds on average, while in-memory Cassandra only take 62.89264 seconds in same case.

並列關鍵字

Apache Cassandra ； HDFS ； in-memory database ； T tree ； LSM tree

參考文獻

[1] Datastax. https://www.datastax.com/.

Google Scholar

[2] M. K. Gupta, V. Verma, and M. S. Verma. In-memory database systems-a paradigm shift. arXiv preprint arXiv:1402.1258, 2014.

Google Scholar

[3] A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35–40, 2010.

Google Scholar

[4] T. J. Lehman and M. J. Carey. A study of index structures for main memory database management systems. In Proc. VLDB, volume 1, 1986.

Google Scholar

[5] Y. Li and S. Manoharan. A performance comparison of sql and nosql databases. In Communications, computers and signal processing (PACRIM), 2013 IEEE pacific rim

Google Scholar

國際替代計量

用HDFS與記憶體資料庫強化Cassandra

全文下載

主題瀏覽