透過您的圖書館登入
IP:18.188.108.54
  • 學位論文

應用於生物資訊之可重組運算加速平台建置

Building a Reconfigurable High Speed Bioinformatics Processor

指導教授 : 蔡育秀
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


藉由對基因序列資料庫進行相似序列之搜尋可供生物學家對新發現序列之功能與2D、3D結構等特性進行預測。但隨著生物資訊的快速蓬勃發展, Gene Bank基因序列資料庫中所包含之序列數量已高達9×10^10對,因此如何於如此龐大的序列資料庫中進行正確而快速的搜尋將成為關鍵之問題。 本研究試圖以硬體運算之方式實現Smith-Waterman Algorithm基因序列比對演算法期望可縮短序列比對之時間。Smith-Waterman Algorithm為基因序列比對時所使用之演算法,該演算法具相當程度之平行性可以systolic array之硬體架構加以實現,因此本研究於Matlab/Simulink環境中利用該環境所提供之基本元件實現可根據所輸入之S序列長度參數化改變陣列長度且又可下載至FPGA中進行硬體運算之演算法IP。 本研究所實現之systolic array主要由最基本的運算單元( Process Element , PE ) 所串聯而成,每一個PE共需要8個D型正反器,而基於此運算單元所實之演算法IP可於Xilinx低成本之FPGA --- XC3S400FT256-4C中實現長度為700個PE之運算陣列且系統工作頻率為85MHz,使得此IP於理論效能上具有每秒進行59.5×10^9次運算之運算效能,相較於通用型處理器(General Purpose Processor)具大幅度之效能提升。

並列摘要


Via similarity searching on the GeneBank, bioinformatics researchers can predict the function and the 2D/3D structure of a newly discovered genetic sequence. With the swift development in genetic sequencing, GeneBank has collected more than 9×10^10 bases. How to efficiently and accurately search similar gene sequences became an important issue. In this research, the Smith-Waterman Algorithm is chosen as hardware acceleration paradigm for gene sequence alignment/comparison acceleration. Smith-Waterman Algorithm is one of several algorithms that are used to search sequence databases. This algorithm features specific parallelism and can be implemented with systolic array hardware architecture. This research uses the Matlab & Simulink fundamental toolbox to implement a silicon intellectual property (SIP). The SIP can be parameterized changed in length of the systolic array according to the length of the S sequence and downloaded into the FPGA to perform hardware computation. In this study, the systolic array is serially comprised of the most basic component, Processor Element (PE). Each PE includes eight D Flip-Flops. A systolic array with 700 PEs and maximal frequency of 85MHz is implemented in a Xilinx’s low cost FPGA---XC3S400FT256-4C. The systolic array can perform 59.5×10^9 updates per second. The results shown the hardware acceleration outperform greatly a general purpose processor.

參考文獻


1."The human genome project information page." , http://www.genome.gov/, 2004 .
2."National Center for Biotechnology Information.", http://www.ncbi.nlm.nih.gov/Genbank/index.html , 2005.
3."the Swiss Institute of Bioinformatics (SIB)." http://au.expasy.org/, 2004 .
4.S.B Needleman and C.D. Wunsch , "A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins.", Journal of Molecular Biology, 1970 , 48 , 443-453.
5. T. F. Smith and M.S. Waterman, "Identification of common molecular subsequences.", Journal of Molecular Biology, 1981 , 147(1):195-197

延伸閱讀