Apache Cassandra 是個由Facebook開發出來的分散式NoSQL資料庫,Cassandra中 有兩個問題,第一,Cassandra不支援檔案的存取,第二,Cassandra 雖然有良 好的寫入速度,但由於架構上使用LSM tree 在搜尋時會需要讀取硬碟多次導 致Cassandra在讀取上的速度卻不理想。 因此我們為Cassandra接上HDFS,且在Cassandra中加上一個新的Key space 使Cassandra可以管理有多少檔案被存在HDFS中,再來在Cassandra Driver中加上 一個DAO做interface讓使用者可以輕易的使用,來解決上述第一個問題。為了解 決第二個問題,我們在Cassandra中加入in-memory table,使用者只需要在建一個 新的column family時額外增加一行即可改變底層的資料結構成T tree 。 最後在與 原有的Cassandra做比較,結果中原本的 Cassandra 在Select 80,000 資料中平均花 費85.10419秒 ,而 in-memory 的 Cassandra 在Select 80,000 資料中平均只需花費 62.89264秒。
Apache Cassandra is a distributed NoSQL database developed by Facebook.Cassandra has two problems.First, Cassandra doesn’t support file access. Second, Cassandra has a good write performance, but in search due to its architecture Cassandra uses LSM tree and needs to read disk multiple times cause bad performance on search. To solve the first problem above, we combine HDFS and Cassandra, and add a new key space in Cassandra.Cassandra can be used to manage how many files are stored in HDFS. Then add DAO interface to Cassandra Driver to make it eaier to user to use. To solve the second problem, we add a memory table in Cassandra, The user only needs to add an extra line when building a new column family, then the data structure inside will change to T tree. Finally, compared with the original Cassandra, it spent on select data cost 85.10418 seconds on average, while in-memory Cassandra only take 62.89264 seconds in same case.