透過您的圖書館登入
IP:3.15.202.4
  • 學位論文

自適應布穀鳥過濾器: 避免非必要的硬碟操作

Adaptive Cuckoo Filters: Avoiding Trips to Disk

指導教授 : 陳銘憲
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


布隆過濾器已經長時間地在各個領域被用來做快速的近似集合成員測試因為它具有簡單但有效的設計。近年來,一種新型態的過濾器 – 布穀鳥過濾器被提出,布穀鳥過濾器支援刪除元素的動作,而且不管在時間上或空間上都比標準的布隆過濾器以及能支援刪除動作的布隆過濾器變形在實務上有更好的表現。然而,在OLTP 資料庫的使用情境中,物件的存取分佈通常是高度不均勻的,也就是某些物件被存取的比其他物件多很多,而在這樣的存取模式中會讓傳統的布隆過濾器以及布穀鳥過濾器無法達到它們理論上該有的偽陽性比率,因為他們都是在假定物件的存取分佈是均勻的情況所設計。 在這篇論文中,我們基於傳統的布穀鳥過濾器而設計了一種新的資料結構 – 自適應性布穀鳥過濾器。我們採用一系列小的布穀鳥過濾器,而在存取分佈不均勻時動態調整部分布穀鳥過濾器的大小以達到比傳統布穀鳥過濾器更低的偽陽性比率。這篇論文展示詳盡的實驗結果,包含ACF 的空間使用,偽陽性比率與速度效能。此外,我們也介紹ACF 如何被應用在一個物聯網資料庫引擎中並且在真實的資料存取模式中有很好的表現。

並列摘要


Bloom filters have been used for fast approximate set membership tests in various areas in a long history because of its compact and simple design. Recently, a newly proposed data structure - Cuckoo filter supports dynamic deletion of elements and has practically better performance in both time and space than Bloom filter and its variants. However, in the scenario of OLTP databases, the access workload is often skewed and will make both Bloom filter and Cuckoo filter fail to achieve their target false positive rate which is calculated in the assumption that the workload is uniform distributed. In this thesis, we present a new data structure called Adaptive Cuckoo Filters (ACF) which can exploit the skewed access pattern and dynamically adjust the size of a list of cuckoo filters to achieve smaller false positive rate than a single cuckoo filter. This thesis also shows the results of comprehensive experiments covering space, precision and speed of ACF. Furthermore, we show how ACF can be applied to an IoT database engine and achieve better performance in real workload.

並列關鍵字

Bloom filters Cuckoo filters Adaptive

參考文獻


[1] A. Rousskov and D. Wessels. 1998. Cache digests. In Computer Netw. ISDN Syst., vol. 30, no. 22–23, pp. 2155–2168.
[2] A. Malik and P. Lakshman. 2010. Cassandra: a decentralized structured storage system. In SI-GOPS Operating System Review, vol. 44, no. 2.
[3] A. Eldawy, J. J. Levandoski, and P. Larson. 2014. Trekking through Siberia: Managing cold da-ta in a memory-optimized database. In Proc. Int. Conf. Very Large Data Bases, pp. 931–942.
[4] B. H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426.
[5] Bruck, J., Gao, J., and Jiang, A. 2006. Weighted Bloom Filter. In IEEE International Symposium on Information Theory.

延伸閱讀