透過您的圖書館登入
IP:3.134.88.228
  • 學位論文

節省空間之三序列比對平行演算法

Parallel Three-Sequence Alignment with Space-efficiency

指導教授 : 唐傳義

摘要


「序列比對」在分子生物學上是一個基本而重要的應用。很多比對序列的方法已經被提出,像是二序列比對(pair-wise sequence alignment)、三序列比對、限制條件序列比對(Constraint sequence alignment)、多序列比對(Multiple sequence alignment)…等,其中多序列比對近來常常被使用也因此顯得更重要。然而,多序列比對已被證明是一個NP-complete的問題,所以漸進式的多序列比對(progressive MSA)被發展出來。漸進式的多序列比對是用二序列比對做為基本步驟,但是這種方式產生出來的結果有時候品質並不是很好。要改善這個問題可以把三序列比對加入漸進式的多序列比對。三序列比對的時間複雜度為O(mnl),空間複雜度為O(mn),其中m, n 和 l 分別為三條序列的長度。由於三序列比對的複雜度很高,其應用性就會被限制。所以降低三序列比對的複雜度就顯得很重要,在這篇論文中,我們提出了一個平行的方法來降低三序列比對的複雜度以提升其應用性。我們用MPI+C語言來實作三序列比對,並且在平行環境上進行測試。我們所提出的平行方法把三序列比對的時間和空間複雜度降到O(mnl/p)以及O(mn/p),其中p代表處理器的數量。在此篇論文我們也分析了我們的平行方法同時也提出實驗數據。從實驗數據可以發現我們的方法有不錯的效能並且能使三序列比對的應用性提高不少。

並列摘要


The sequence alignment is a fundamental problem in computational biology. Many sequence alignment methods have been proposed such as pair-wise sequence alignment, multiple sequence alignment (MSA), syntenic alignment, constraint multiple sequence alignment, and so on. Among these methods, pair-wise sequence alignment is most commonly used. Recently, MSA is more and more important which has been used to solve many molecular biological problems. The sum-of-pairs MSA problem has been proved to be a NP-complete problem. A progressive algorithm is used to solve MSA problem which is based on the pair-wise sequence alignment. However, the quality of the result of progressive MSA is not satisfied sometimes. This problem may be solved by using three-sequence alignment as a basic step instead of pair-wise sequence alignment. Three-sequence alignment can be solved sequentially in O(mnl) time complexity and O(mn) space complexity, where m, n and l are the lengths of the sequences to be aligned. The complexities of three-sequence alignment limit its applicability. Hence, to reduce the complexities becomes an important issue. In this paper, we propose a parallel algorithm to solve this problem. Our algorithm requires O(mn/p) space complexity and O(mnl/p) time complexity, where p is number of processors. Both experimental results and theoretical analysis are presented. The experimental results show that our algorithm is applicable and achieves a good speed up.

參考文獻


[1] S. Aluru, N. Futamura, and K. Mehrotra, “Parallel biological sequence comparison using prefix computations,” J. Parallel and Distributed Computing, vol. 63, issue 3, pp. 264 – 272, 2003
[2] P. Bonizzoni and G. D. Vedova, “The complexity of multiple sequence alignment with SP-score that is a metric,” Theoretical Computer Science, vol. 259, issue1–2, pp. 63–79, 2001.
[3] H. Carrillo and D. Lipman, The multiple sequence alignment problem in biology, SIAM Journal on Applied Mathematics, 48 (1988) 1073-1082.
[4] S.C. Chan, A.K.C. Wong and D.K.Y. Chiu, A survey of multiple sequence comparison methods. Bulletin of Mathematical Biology, 54 (1992) 563-598
[6] O. Gotoh, “An improved algorithm for matching biological sequences,” J. Molecular Biology, vol. 162, pp. 705-708, 1982

延伸閱讀