透過您的圖書館登入
IP:18.216.32.116
  • 學位論文

以改良式蛋白質結構編碼方法應用在保存區域探尋

Mining conserved regions by an improved protein structural encoding method

指導教授 : 黃乾綱
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


蛋白質結構分析主要可分成整體(global)結構與局部(local)結構兩個層面,其中局部結構對於蛋白質功能性的分析特別息息相關,在某一蛋白質分群中常出現的相似結構片段,可能具有蛋白質生物性或演化上的意義,生物學家稱這些局部結構為保存區域(conserved region)。然而根據PDB網站所蒐集的蛋白質結構數量,目前已突破五萬大關,在這麼多的蛋白質中結構中,如何利用資料探勘技術,擷取出有意義的局部結構,進一步鑑定是否就是所謂的保存區域,便成為生物學家熱切的研究方向。 在本論文中,以NRS(neighborhood residues sphere)的概念,利用一個球體空間來記錄蛋白質局部結構中胺基酸殘基的分佈狀況,為了達到快速比對與分群的目的,我們嘗試將蛋白質結構編碼,以一維資訊儲存成特徵值來代表每個局部結構的空間資訊,經過不斷的探討與實驗,驗證了各種不同雜湊格設計方式的優劣以及緩衝區的必要性,進而改良出最能準確紀錄局部結構資訊的編碼方式,並應用在保存區域的探索上,期望在分類於同一種酵素底下的蛋白質中,找出那些對催化作用有影響力的區域。 此外,我們也將局部結構編碼方式應用在蛋白質整體結構比對上,快速地找出整體結構間高度相似的局部區域,利用它們作為比對時的基礎,將結構轉換至同一座標系統以方便作比對,並觀察蛋白質結構中其他區域的相似度,進而延伸探討stability與flexibility的問題。 本論文的出發點,在於希望能發展出一套快速描述局部結構之空間資訊的演算法流程,為每個蛋白質建立出可能的保存區域候選結構,實驗成果以及中間所遭遇的問題都值得將來在面對同樣的議題時,提出思考或是改進的空間。

並列摘要


Analysis of protein structure were mainly divided into two aspects – global structure and local structure,especially the latter correlated closely with analysis of protein function. Most biologists supposed when some frequent patterns reveal in certain protein structure group, it may have some meanings of protein function or evolution in these regions, biologists usually name these regions “conserved regions”. Unfortunately it is very time-consuming when we want to find these conserved regions in a huge database of protein structure, and therefore how to use technology of data mining to solve this problem has become a hot thesis of bioinformatics. In this paper, we use concept of NRS (neighborhood residues sphere) to record distribution of amino acid residue of protein local structure. In order to cluster similar local structure quickly, we encoded every protein local structure to 1-Dimension information. Through heuristic experiments and discussions, we verified accuracy of every encoding method. Further we applied encoding method to mine possible conserved regions which may catalyze in enzyme structure classification database. Finally we also discussed the issue of flexibility and stability of global structure based on this structure encoding method scheme.

參考文獻


Bennett, S. P., C. G. Nevill-Manning, et al. (2003). "3MOTIF: visualizing conserved protein sequence motifs in the protein structure database." Bioinformatics 19(4): 541-2.
Birzele, F. and S. Kramer (2006). "A new representation for protein secondary structure prediction based on frequent patterns." Bioinformatics 22(21): 2628-34.
Bystroff, C. and D. Baker (1998). "Prediction of local structure in proteins using a library of sequence-structure motifs." J Mol Biol 281(3): 565-77.
Capra, J. A. and M. Singh (2007). "Predicting functionally important residues from sequence conservation." Bioinformatics 23(15): 1875-82.
Chen, S. C. and I. Bahar (2004). "Mining frequent patterns in protein structures: a study of protease families." Bioinformatics 20 Suppl 1: i77-85.

延伸閱讀