透過您的圖書館登入
IP:3.145.77.114
  • 學位論文

對於Burrows-Wheeler轉換並利用D-Critical子字串的異質後綴陣列建構法

hSA-DS: A Heterogeneous Suffix Array Construction Using D-Critical Substrings for Burrows-Wheeler Transform

指導教授 : 許雅三

摘要


Burrows-Wheeler Transform (BWT) 是個廣泛運用於資料壓縮以及生物科技的字串轉換方法,現在常用的Bzip2壓縮器就是一個例子。然而,以BWT為基礎的壓縮器雖然擁有較高壓縮率,但會有較長的壓縮所需時間。若以數學方法的角度來檢視BWT的話,我們可以發現BWT的結果其實可以從已建構出的後綴陣列 (Suffix Array) 來推導得出。這樣一來,BWT可以得利於最近數十年研究員所發展出的各式各樣線性時間後綴陣列建構演算法。 在另一方面,圖型處理器 (GPU) 近幾年來已然成為執行平行運算程式時,成本上最有效率的解決方案,之前發展的線性時間後綴陣列建構演算法得以利用GPU的運算能力來達到更好的效能。本論文中分析了現在已存在且平行化的後綴陣列演算法,接著提出了第一個根據SA-DS演算法所改進的異質SA-DS演算法。本異質SA-DS演算法同時利用了GPU與CPU的運算效能,並且著重在對於BWT壓縮器所需的100K到2M字元的字串長度。同時,為了達到更好的表現,為我們的平台優化了原先CUDA所提供的平行基數排序法。 最後,整個異質SA-DS運行在NVIDIA的GPU上且與原本的SA-DS版本做比較。從結果來看,我們優化後的基數排序法在排序1M個字元所需的執行時間較Thrust Library的版本少了23%;我們的異質SA-DS演算法展現了相較於原先循序的版本最多4倍的加速,並展現了相較於最新CUDPP Library的BWT最多2倍的加速。

並列摘要


Burrows-Wheeler Transform (BWT) is a widely-used algorithm applied in data compression techniques like bzip2 and bioinformatics. The BWT-based compression strategy has better compression rate, but longer compression time. In mathematically point of view, BWT can be derived from the constructed suffix array. For decades, researchers developed many of suffix array construction algorithms (SACAs) that benefit the BWT algorithm. On the other hand, graphics processing units (GPU) has emerged as the most cost-efficient solution in the field of parallel computation recently, and former linear-time SACAs begin utilizing the computational power of GPUs. In this work, we analyze the current parallel implementations of SACAs and introduce the first heterogeneous implementation of SA-DS algorithm. The implementation leverages both of the CPU and GPU, and we focus on the typical block sizes, 100K to 2M characters, for BWT-based compression. In order to achieve better performance, we also optimizes the up-to-date radix sort on GPU for our platform. Finally, the implementation is evaluated on the heterogeneous platform equipped with a NVIDIA GPU using the CUDA programming model. As the result, the optimized radix sorting on GPU shows up to 23% decreased time compared with latest Thrust library for sorting millions of keys. Our heterogeneous SA-DS demonstrates up to 4x over sequential C++ version of SA-DS and has a performance gain up to 2x than parallel BWT provided by the CUDPP library.

參考文獻


[2] "bzip2-1.0.6," 2015. [Online]. Available: http://www.bzip.org
[3] H. Li and R. Durbin, "Fast and accurate short read alignment with burrows-wheeler transform," Bioinformatics, vol. 25, no. 14, pp. 1754-1760, 2009.
[4] R. Li, C. Yu, Y. Li, T.-W. Lam, S.-M. Yiu, K. Kristiansen, and J. Wang, "Soap2: an improved ultrafast tool for short read alignment," Bioinformatics, vol. 25, no. 15, pp. 1966-1967, 2009.
[5] J. Ziv and A. Lempel, "A universal algorithm for sequential data compression," IEEE Transactions on information theory, vol. 23, no. 3, pp. 337-343, 1977.
[6] J. Ziv and A. Lempel, "Compression of individual sequences via variable-rate coding," Information Theory, IEEE Transactions on, vol. 24, no. 5, pp. 530-536, 1978.

延伸閱讀