透過您的圖書館登入
IP:3.141.197.135
  • 學位論文

具備仿射間隙評分法回溯功能之序列映射器平行架構設計

A Parallel Design of Dynamic Programming Sequence Aligner with Affine Gap Traceback

指導教授 : 盧奕璋

摘要


序列排序(sequence alignment)是生物資訊學中一個非常重要的問題,而基於動態規劃的序列映射器可以保證其排序結果為數學上的最佳解,因此有其應用上的不可取代性。然而動態規劃的時間複雜度使相關演算法在面對大量生物資訊時會耗費相當多的時間。過去有許多研究針對動態規劃設計硬體平行加速映射器,但由於記憶體上的限制,通常只提供排序最高分的結果,較少研究強調映射器的回溯(traceback)功能。 本論文提出一專用積體電路(Application-Specific Integrated Circuits, ASIC)設計之硬體加速器,此設計具備仿射間隙評分法(Affine Gap Penalty)的回溯功能。本論文提出兩個方向可以減少序列回溯時的記憶體使用量,並能幫助映射器更好的管線化(pipeline)時序。 本加速器使用台積電40奈米製程實作一個能夠排序兩條最長1024 x 992序列的動態規劃映射器,映射器可以輸入使用不同的蛋白質計分表。電路運作頻率為357 MHz,晶片大小$6.879 mm^2$,功率438.5 mW。在硬體模擬實驗中,我們使用硬體映射器加速MAFFT G-INS-1演算法中最耗費時間的配對排序(pairwise alignment)步驟,包含動態規劃與回溯所花費的時間比起軟體約加速570倍。

並列摘要


Sequence alignment plays an important role in bioinformatics. Since the alignment algorithms based on dynamic programming (DP) guarantee the optimal alignment results corresponding to the scoring system used, DP alignment algorithms have been popular since their invention. However, due to the time complexity of DP, the related algorithms consume a lot of time with rapidly increasing sizes of bioinformatics data. The situation induced many hardware parallelization design on DP aligner. However, due to the memory size limitation, few researches focus on the traceback function of DP sequence alignment algorithms. In this thesis, we propose an application-specific integrated circuits (ASIC) design on DP protein sequence aligner with affine gap traceback function. The author proposes two methods that can reduce the memory usage of traceback function and improve the pipeline timing of the design. The hardware is implemented with TSMC 40 nm technology. The maximum input sequence size of the aligner is $1024 imes 992$. The aligner can take different scoring table according to the input. The circuit operates at $357 MHz$, sized $6.879 mm^2$ with power $438.5 mW$. In post-layout simulation, we test our aligner design on the pairwise alignment step of G-INS-1 algorithm in MAFFT. The whole process including sequence alignment and traceback can be accelerated more than $570$ times comparing to software.

參考文獻


[1] Saul B Needleman and Christian D Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular
biology, 48(3):443–453, 1970.
[2] Temple F Smith and Michael S Waterman. Identification of common molecular subsequences. Journal of molecular biology, 147(1):195–197, 1981.
[3] Richard Bellman. The theory of dynamic programming. Bulletin of the American Mathematical Society, 60(6):503–515, 1954.
[4] Stephen F Altschul, Thomas L Madden, Alejandro A Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J Lipman. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic acids research, 25(17):3389–3402, 1997.

延伸閱讀