利用二級結構資訊提昇蛋白質非穩定區段的預測準確度

現在有愈來愈多的蛋白質或其序列的某些區段，被發現折疊之後並無法形成穩定的結構。而在這些非穩定區段（disordered regions）有些已經被証實有特定的生物功能。其它一些沒有功能的非穩定區段由於在空間上的形狀是比較有彈性的，因此可以提供折疊跟纏繞空間讓作用區段能和其它的對象結合作用。此外，亦有其它研究發現，這些擁有不穩定區段之蛋白質，常常可藉由與其它的作用分子結合而形成穩定結構並使其功能活化。所以蛋白質非穩定區段的相關研究和預測是有助於蛋白質結構與功能之相關分析。近年來，有許多非穩定區段的預測方法是利用胺基酸的組成，或者胺基酸的生化性質來做為預測時所使用的特徵值，也有許多方法曾試圖引入二級結構資訊進行預測，本論文針對二級結構資訊提出幾種有意義的特徵值來進行實驗，並討論其各自對非穩定區段預測的表現結果之影響。本論文採取二階段式的方法來做蛋白質非穩定性區段的預測，在第一階段取蛋白質序列上資訊來當特徵，利用 RBFN （Radial Basis Function Network）來做非穩定區段的預測。在此同時，利用二級結構的預測工具來預測蛋白質的二級結構，轉換為以距離方式來呈現二級結構資訊的特徵；第二階段時，利用第一階段預測結果，然後整合二級結構的資訊進行最後的預測。而實驗證明轉化後之二級結構資訊有助於預測結果之準確度，其中以距離最近二級結構之的資訊對於預測蛋白質非穩定性區段是有明顯幫助的。

關鍵字

蛋白質；非穩定區段；序列；二級結構

並列摘要

There are increasing quantities of proteins discovered to contain regions that do not form stable tertiary structures in their native states. Such sequence fragments that have no propensity to form specific structures are regarded as “disordered regions”. Some disordered regions have been justified to be functionally significant. Therefore, a reliable predictor for such disordered regions is important for further understanding of protein functions. Most recent studies employ the amino acid composition and/or a number of biochemical properties within a sliding window with respect to the target residue as the feature set in predicting protein disorder. In this regard, this thesis conducts a comprehensive study on the performance of a recently proposed feature set which considers both physicochemical properties and amino acid propensity for order/disorder, and demonstrates how a two-stage framework improves the accuracy of the classifier. Furthermore, we propose a novel feature based on protein secondary structures to reduce potential false postives. This thesis attempts several ways of extracting information from the local secondary structures. The experimental results reveal that the feature set taking the distance to the nearest secondary structure element (SSE) of the target residue outperforms the others. In particular, it is observed that employing the proposed feature set in the second stage delivers better accuracies than.that is used together with the original feature sets.

並列關鍵字

protein ； disorder region ； SSE ； sequence analysis ； disorder

參考文獻

2. Romero, P., et al., Sequence complexity of disordered protein. Proteins, 2001. 42(1): p. 38-48.

4. Yang, Z.R., et al., RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics, 2005. 21(16): p. 3369-76.

5. Jones, D.T. and J.J. Ward, Prediction of disordered regions in proteins from position specific score matrices. Proteins, 2003. 53 Suppl 6: p. 573-8.

6. Romero, P., Z. Obradovic, and A.K. Dunker, Folding minimal sequences: the lower bound for sequence complexity of globular proteins. FEBS Lett, 1999. 462(3): p. 363-7.

7. Romero P, O.Z., Kissinger C, Villafranca JE, Dunker AK, Identifying disordered regions in proteins from amino acid sequence. Proc. IEEE Int.Conf. Neural Networks., 1997: p. 1:90-95.

國際替代計量

利用二級結構資訊提昇蛋白質非穩定區段的預測準確度

全文下載

主題瀏覽