透過您的圖書館登入
IP:3.138.134.102
  • 學位論文

具重複資料刪除之EXT4檔案系統於NVM上之研究

A study of EXT4 file system with data deduplication support on NVM

指導教授 : 衛信文

摘要


隨著科技不斷的發展,出現了很多新的技術和產品,而這些新的技術和產品大多都有龐大的資料量以及對資料快速讀寫的需求,因此在儲存媒介上的要求也相對的提高很多,所以非揮發性記憶體(Non-Volatile Memory;NVM)這一種電腦記憶體也就顯得重要了。非揮發性記憶體是一種能夠兼顧讀寫速度以及一定容量的電腦記憶體,但相較於傳統儲存媒介,儲存空間仍顯不足,也因此如何節省資料儲存空間則成為一大挑戰。 而節省資料儲存空間的技術主要分成兩類,壓縮檔案以及重複資料刪除。重複資料刪除這項技術會將電腦裡面多筆相同的資料進行刪除只保留一份,這樣可以減少儲存的資料量和提高資料寫入速度並且保持所有資料的完整性,在現有的重複數據刪除技術中有兩種級別的處理方式,分為文件級別重複數據刪除以及塊級別的重複數據刪除這兩種,文件級別的重複數據刪除也就是以一個文件或是檔案當作單位去做判斷,判斷這個文件有沒有相同的資料,從而進行刪除,而塊級別重複數據刪除則是把每筆資料去做切割(約4kb~12kb),這種做法比起文件級別的重複數據刪除更可以大大的提升數據的重複率從而提高節省的空間達到想要的效果,但也因此會增加一些讀寫速度的成本。 故本論文的主要研究方法是要透過修改第四代擴充套件檔案系統(Fourth extended filesystem,縮寫為EXT4)使EXT4做到塊級別的重複數據刪除技術並且將其系統放在非揮發性記憶體,而為了讓檔案系統在NVM上有更好的效能並節省更多的空間,所以本論文在檔案系統的整體結構上進行一些新增與修改,並透過EXT4的Extent架構來幫每一個切割好的塊去做整理以及尋找相同資料的塊來進行重複數據刪除。本篇論文的資料切割使用可變長度的資料塊去將資料切割以提高資料的重複率。 而透過本論文中的模擬結果與分析我們不難發現,本篇論文的方法DeEXT4 檔案系統可以有效的減少重複資料寫入硬碟裡面,而且在檔案重複性較高的文件不只可以減少一般數據的寫入量還可以減少metadata 的使用量,這樣的結果對於EXT4系統有很大的幫助。

關鍵字

EXT4 資料重複刪除 資料切割 NVM

並列摘要


With the continuous development of technologies, many new techniques and products have emerged. Most of these new technologies and products yield huge amount of data and require high speed of reading and writing data. Therefore, the needs of storage are comparatively increased and the Non-Volatile Memory (NVM), which considers both the read/write speed and the data capacity becomes an important storage medium. However, NVM has a relatively small data capacity compared to traditional storage such as disk, it is important to reduce the needed storage space of data. The technology for saving data storage space is mainly divided into two categories, data compression and data deduplication. Data deduplication will delete multiple copies of the same data on the computer and leaving only one copy. The needed space of data can therefore be reduced and the speed of data writing can be improved. There are two mainly techniques in deduplication, i.e., file level deduplication and block level deduplication. File level deduplication considers a file as a unit for dedupe, whereas block level deduplication cuts a file into data blocks and considers a block as a unit for dedupe. Block level deduplication can greatly reduce the storage space compared to file level deduplication. Therefore, in this thesis, we strengthen the ability of the EXT4 file system with data deduplication functionality. To make the file system have better performance on NVM and save more space, we made some changes to the structure of filesystem and utilize the Extent structure of EXT4 to track every data block for searching the same data block and for deduplication. The proposed filesystem called DeEXT enable EXT4 to support block-level deduplication efficiently while writing data into NVM storage. As the simulation and analysis results show in this paper, DeEXT4 filesystem can effectively reduce the duplicate data written into the storage, and reduce larger amount of metadata if the file duplicated rate is higher.

並列關鍵字

EXT4 Data Deduplication content-defined chunking NVM

參考文獻


參考文獻
[1] 儲存技術大未來 希捷以SMR、氦氣填充、HAMR擴展儲存技術極限,(2016,June,05),Retrieved from https://www.computerdiy.com.tw/seagate-smr-hamr/
[2] Leo Zhenjun Li ,(2017,August,16), 當我們說重刪,我們 https://community.emc.com/thread/240827?start=0&tstart=0
[3] 重複數據刪除,(2017,November,1) .Retrieved from https://zh.wikipedia.org/wiki/%E9%87%8D%E5%A4%8D%E6%95%B0%E6%8D%AE%E5%88%A0%E9%99%A4
[4] ext文件系統機制原理剖析.(2017,June,15) Retrieved from http://www.cnblogs.com/f-ck-need-u/p/7016077.html

延伸閱讀