透過您的圖書館登入
IP:18.219.205.202
  • 學位論文

3D-SARST: To improve the accuracy of protein structural similarity search by three dimensional SARST maps

3D-SARST:藉由三維SARST圖譜來改進蛋白質結構搜尋之準確度

指導教授 : 呂平江

摘要


中文摘要 隨著生物科技的進步,被解出的蛋白質結構越來越多。本研究重點在於開發兼具高速度與高準確率的蛋白質結構搜尋軟體,而我們稱其為SARST ( Protein Structure Similarity Search by Ramachandran Codes )。此研究應用由蛋白質架構上的phi角和psi角組成的Ramachandran Plot可以偵測出蛋白質二級結構的原理,期望將複雜的三維蛋白質結構轉成簡單的一級序列,而如何定義與製造出能保留重要蛋白質結構特性的序列便是我們最大的難題與挑戰。SARST和現今常被拿來做蛋白質結構搜尋的兩個軟體BLAST及CE相比,已經同時擁有幾乎同等BLAST的速度與略遜CE僅4%的準確率。 本研究重點在於利用加入更多蛋白質結構訊息進入將Ramachandran Plot的概念,把二維的圖譜展開成三維的立體方塊,並利用此方塊做結構轉換成序列的動作,企圖製造出更有意義的序列,我們稱此方法為3D-SARST。 我們的3D-SARST在每個不同的情況下各自有較好的參數組合,透過事先將每個蛋白質分類,再使用最佳參數做搜尋,目前已經提升了2%的準確率,並且和SARST一樣擁有相當快的速度,相信經過程式的最佳化,3D-SARST將會更有效率。在這資訊爆炸的時代裡,蛋白質結構的資料每天都在增加,而蛋白質結構更堪稱是解開生命科學之秘的基石,我們可以相信擁有好的搜尋軟體就像是擁有一把最鋒利的劍,3D-SARST將會是這把劍,帶領我們解開生物的奧秘。

關鍵字

蛋白質 結構 資料庫 搜尋 比對 準確率

並列摘要


Abstract The amount of protein structural data is growing so rapidly that fast and accurate structure similarity search tool is in a strong demand. We have developed a structural similarity search tool SARST (Structural similarity search Aided by Ramachandran Sequential Transformation) that is able to perform extremely rapid database search with accuracy comparable to CE (Combinatorial Extension) by using a linear encoding methodology. Now we aim to modify the linear encoding strategy of SARST by integrating more protein structural information to improve its accuracy. SARST linearly encode protein structures by utilizing a Ramachandran map organized by nearest-neighbor clustering. Traditionally, Ramachandran map is a two-dimensional (2D) plot displaying the distribution of dihedral angles (φ, ψ) of residues. Different regions on this map represent different secondary structural preferences of backbone local structures; however, structural information can be lost in the process of transforming the three-dimensional (3D) protein structure into the 2D map. Our speculation is that, if we can extend the Ramachandran plot into a 3D map by adding an extra axis describing another structural property of backbone conformation, more structural information can be preserved in the transformation processes and thus improves the performance of SARST. Hence, we call the new search tool developed based on this speculation 3D-SARST. 3D-SARST, adopting the advantage of SARST, is a rapid database search tool with reasonable compromise of accuracy. Although we have not found a suitable condition to make it generally outperform SARST, we do find that 3D-SARST can achieve higher accuracy for various structural classes under specific conditions. According to the results, we can firstly determine the structural class of the query protein and then use 3D-SARST running under appropriate condition and parameter settings for that class to increase the accuracy of database searching. This two-step strategy has improved the precision of SARST by 2%, making its accuracy closer to CE. As the amount of protein structural data increases ever rapidly nowadays, we suppose that an efficient database search engine such as 3D-SARST can be valuable in many post-genomic research fields.

並列關鍵字

3D-SARST database protein search structure HASH(0x1c6fb4c0)

參考文獻


1. Berman, H.M., The Protein Data Bank: a historical perspective. Acta Crystallogr A, 2008. 64(Pt 1): p. 88-95.
2. Sauder, J.M., J.W. Arthur, and R.L. Dunbrack, Jr., Large-scale comparison of protein sequence alignment algorithms with structure alignments. Proteins, 2000. 40(1): p. 6-22.
3. Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997. 25(17): p. 3389-402.
4. Kolodny, R., P. Koehl, and M. Levitt, Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol, 2005. 346(4): p. 1173-88.
5. Murzin, A.G., How far divergent evolution goes in proteins. Curr Opin Struct Biol, 1998. 8(3): p. 380-7.

延伸閱讀