透過您的圖書館登入
IP:3.135.213.214
  • 學位論文

基於氨基酸序列樣式之未知結構蛋白質片段分類

Classifying protein fragment of unknown structure base on amino acid sequence pattern

指導教授 : 陳中明

摘要


蛋白質結構的瞭解在過去的生物醫學的相關研究中扮演了非常重要的角色。在傳統上以實驗方法來驗證其結構,例如:X-射線繞射法或NMR光譜法。而此二者皆有其技術上的困難與限制,如果可以正確的從蛋白質序列對應到正確的蛋白質結構,不但可以省卻實驗上所必須花的人力物力,更可以進一步迅速的瞭解蛋白質可能具有的其他功能,以解決在製藥或是臨床實驗方面可能遇到的瓶頸。蛋白質預測的主要困難在於主要模版的選取以及沒有同源蛋白質可以做為參考,造成主要骨架在建立上有所偏差,即使以區域性的最佳化亦無法修正回來。本論文提出了一個新的方法,可以將蛋白質依照小片段的結構進行分類,以進而瞭解該蛋白質可能的結構。 本論文利用序列樣式探勘作為小片段的主要分類方法主要靈感來自於在建構蛋白質小片段資料庫時發現有許多特定的序列模式。若可以將所有資料庫中的分群都找出所有可能的特定序列樣式,也就是找出類似結構的小片段氨基酸之間的作用力的特定模式,就可以將其資料庫的每個分群給予不同的特徵,以判斷出未知結構的序列可能對應的結構。在類似結構的蛋白質之間,氨基酸序列不一定相近,但是使用座標做註解的片段分類必定相近,且理論上可以有效減低因序列少許變動而產生的之結構變化。 有鑑於目前不論蛋白質三級結構預測或是二級結構預測都相當仰賴同源蛋白質的存在,在本論文提出的方法中刻意捨棄尋找同源蛋白質的步驟,但秉持著同源模擬法的概念進行廣義的同源模擬法,在建立的測試資料中可以達到正確率在78%左右,而實際個案中也達到77%以上的正確率。

並列摘要


Protein structure prediction plays a major role in clinical research of biomedical science in the 21st century. In the past, protein structures were obtained by X-ray diffraction or nuclear magnetic resonance but both methods have technical limitation. If a fragmented protein sequence can be correctly matched to its structure, it would be more effective to infer the unknown functions of a protein with less expense and numerous problems in biomedical area may be resolved naturally. The major difficulty in developing the method of protein structure prediction lies in the selection of the protein backbone template especially when there is no homology protein to refer to, which leads to the deviation of backbone structure even though it is a local optimal structure. In this study, we proposed a new method to classify protein fragment, which not only discovered the possible structures of each protein fragment but also opened up varieties of possibilities to predict the whole protein structure. The primary idea of the proposed method is based on pattern mining of protein fragment sequences. It is motivated by the observation that there exist a finite number of specific sequence patterns in each class of protein fragments and these patterns may imply not only sequence information but possible molecular interaction. Once we found out these patterns, we could assign appropriate class to each fragment of protein and match fragments to the possible structures. If two proteins are similar in structures, it does not imply that their sequences be similar as well but using the classes characterizing different protein fragment structures to annotate that two protein sequences should be similar. Theoretically, it can reduce the structural deviation caused by slight sequence difference. Recognizing the potential drawbacks of depending on the existence of homologous proteins commonly found in conventional secondary and tertiary protein structure predictions, in the proposed method, we deliberately dropped the step of finding homologous proteins but still kept the concepts of homology modeling. The prediction accuracy in test data is about 78% and in whole sequence cases is more than 77%.

參考文獻


[56] 林書鴻,1994。基於核苷酸序列樣式之基因剪接位置預測演算法。碩士論文,國立台灣大學醫學工程學研究所。
[1] Abagyan RA, Batalov S. 1997. Do aligned sequences share the same fold? J Mol Biol. 273(1):355-368.
[2] Abagyan R, Batalov S, Cardozo T, Totrov M, Webber J, Zhou Y. 1997. Homology modeling with internal coordinate mechanics: Deformation zone mapping and improvements of models via conformational search. PROTEINS: Structure, Function, and Genetics, Suppl. 1: 29–37.
[4] Blanco FJ, RivasG, Serrano L. 1994. Ashort linear peptide that folds into a native stable beta-hairpin in aqueous solution. Nat. Struct. Biol. 1: 584–590.
[5] Bystroff C, Baker D. 1998. Prediction of local structure in proteins using a library of sequence-structure motifs. J. Mol. Biol. 281: 565–577.

延伸閱讀