以多圖形處理器加速近似字串比對

字串比對在許多領域中被廣泛運用，例如網路入侵偵測系統、網際網路搜尋、去氧核糖核酸序列比對等等；其中，字串比對可區分為固定字串比對與近似字串比對兩類。固定字串比對是指找出所有字串樣式於輸入文字中出現的位置，所有字串樣式必須精確的比對，不容許任何錯誤；而所謂近似字串比對是指所搜尋的字串樣式則可經由插入、刪除及替代等有限次數的動作，轉換成輸入文字中的某部分。近似字串比對的演算法可區分為dynamic programming與bit-parallelism；Dynamic programming需經龐大運算及記憶體空間記錄誤差值，故在處理大量資料時，將為此演算法之瓶頸；反之，bit-parallelism運用邏輯運算子模擬非確定有限狀態自動機進行比對，速度快且節省記憶體。近似字串比對的運算量大且非常耗費時間，尤其在針對大量的輸入文字比對大量的字串樣式時，其耗費的時間更為明顯。本研究將分析並實現bit-parallelism與dynamic programming於NVIDIA GPU上，實驗結果顯示在處理2Gbytes的輸入文字時，執行於單一個GPU的bit-parallelism較執行於單一執行緒CPU版本的bit-parallelism快上7倍的加速。本研究並進一步透過openMP連結多個GPU的加速，其結果顯示在處理2Gbytes的輸入文字時，以2個GPU加速之bit-parallelism較執行於單一執行緒CPU版本的bit-parallelism快上10倍的速度。

關鍵字

近似字串比對； GPU ； dynamic programming ； bit-parallelism

並列摘要

Sting matching has been widely used in many areas, such as NIDS, web searching, and DNA sequence matching, etc. Sting matching can be classified into two main categories, exact string matching and approximate string matching. Exact string matching finds all occurrences of given patterns in an input string while approximate string matching allows difference between a string and a pattern caused by insertion, deletion and substitution. Approximate string matching has two main approaches, dynamic programming and bit-parallelism. Because dynamic programming needs very large memory space for store results, it is difficult to deal with huge inputs and not proper to be implemented on GPUs. On the other hand, as bit-parallelism uses bitwise operators to simulate NFA, it is fast and memory-efficient. Because approximate string matching is a very compute-intensive task, accelerating approximate string matching has become critical for processing big data. In this thesis, dynamic programming and bit-parallelism are implemented and evaluated on NVIDIA GPUs. The experimental results show that the bit-parallelism performed on GPUs achieves 7 times faster than single threaded CPU version for processing 2Gbytes data. Furthermore, multiple GPUs are adopted to increase throughput of processing big data. For processing 2Gbytes data, adopting two GPUs can achieve 10 times faster than single threaded CPU version.

並列關鍵字

approximate string matching ； GPU ； dynamic programming ； bit-parallelism

參考文獻

Liu, T., Huang, X., Yang, L., & Zhang, P. (2009, September). Query by Humming: Comparing Voices to Voices. Paper presented at the IEEE International Conference on Management and Service Science, Beijing, China.

Lin, C., Tsai, S., Liu, C., Chang, S., & Shyu, J. (2010, December). Accelerating String Matching Using Multi-threaded Algorithm on GPU. In Y. Dong (Chair), Internet Security. Symposium conducted at the 2010 IEEE Global Telecommunications Conference, Miami, FL.

Baeza-Yates, R., & Navarro, G. (1999). Faster Approximate String Matching. Algorithmica, 23(2), 127-158.

Dinu, L. P., & Ionescu, R. (2011, September). A Genetic Approximation of Closest String via Rank Distance. In L. Ciortuz (Chair), Artificial Intelligence III. Symposium conducted at the Symbolic and Numeric Algorithms for Scientific Computing Symbolic and Numeric Algorithms for Scientific Computing, Timisoara, Romania.

Hyyrö, H. (2008). Improving the bit-parallel NFA of Baeza-Yates and Navarro for approximate string matching. Information Processing Letters, 108(5): 313-319.

國際替代計量

以多圖形處理器加速近似字串比對

主題瀏覽