  • 學位論文

Predicting the structural characteristics of membrane proteins by computational approaches


指導教授 : 許聞廉 呂平江


This thesis consists of several works that are related to predicting the structural characteristics of membrane proteins from sequence using machine learning methods. Taken together, these predicted structural features are important modules towards ab initio modeling of membrane proteins structures. First, a membrane topology prediction method, SVMtop, was developed using support vector machines. A novel topology scoring function was proposed and SVMtop improves current state-of-the-art approaches by achieving over 70% in accuracy for correctly predicting both the locations of transmembrane (TM) helices and sidedness in standard benchmarks. Building upon this work, TMhit was developed to predict helix-helix interactions from residue contacts. We calculated statistical propensities for contact pairs between interacting TM helices and found that small and polar residues play an important role in interhelical contacts. In TMhit, contact propensities were incorporated with other sequence and structural features for training the SVMs in a novel two-level framework. Compared to the conventional method, the proposed two-level framework not only significantly reduces computational costs but also the number of false positives. Lastly, the development of a new method to predict the residue solvent accessibility of in TM domains is described (manuscript in preparation). The method employs a random forests algorithm for feature selection and regression. To this end, it achieves a mean absolute error of 27.25Å2 and a Pearson’s correlation of 0.50 based on 5-fold cross validation. In summary, the presented works in this thesis comprise several computational approaches to facilitate structure/function prediction in membrane proteins. While the growth of membrane protein structure continues to accumulate at a slow pace, bioinformatics methods will play an important role in advancing our understanding in membrane protein structure assembly and function.


本論文主要探討的題目為 「以機器學習計算方法預測膜蛋白結構上的特性 」。 這些特性包含了膜蛋白(membrane protein)的碩樸預測(topology prediction)、穿膜螺旋(transmembrane helix)之間的交互作用(interactions),和氨基酸空間上的接觸(contacts)預測、還有穿膜螺旋的對於脂質的暴露面積 (lipid exposure surface) 預測。這些不同的特性是在發展結構預測裡重要的一環,特別是針對膜蛋白,因為此類已知的結構甚少。首先,本論文在碩樸預測上,描述一種新的方法叫做SVMtop,利用階級式(hierarchical)的分類法,運用支持向量機(support vector machines)和新的記分函數來預測碩樸。SVMtop在準確率上超越許多已發表方法,特別是穿膜螺旋位置與方向皆正確的情況,此正確率大約70% 。第二,在穿膜螺旋交互作用預測的問題上,我們首先用統計方法計算胺基酸形成接觸的傾向分數(propensity scores),發現體積小的和帶有極性的(polar)胺基酸有較高的傾向形成胺基酸空間上的接觸。我們發展了一套方法名為TMhit,利用二階式(two-level)的架構,同時加入的其他序列和結構的特徵來訓練支持向量機。比較傳統方法,此新二階式系統更能減少多餘的計算和錯誤預測。最後,在預測穿膜螺旋的對於脂質的暴露面積上,本論文描述了一個新方法,運用隨機森林(random forests)來挑選特徵(feature selection)還有回歸(regression)預測。目前以五分交叉確認法(5-fold cross validation),最好的結果為平均絕對誤差為27.25 Å 2 和相連指數(correlation)為0.50。本論文希望能經過這些方法,在預測膜蛋白結構上有長足的進步。由於目前實驗上解出膜蛋白結構仍有許多頻頸,發展生物資訊方法預測膜蛋白的結構及功能的方法,將成為深入了解膜蛋白在生物體中的角色裡非常重要的方向。


Adamczak, R., Porollo, A. and Meller, J. (2004) Accurate prediction of solvent accessibility using neural networks-based regression, Proteins, 56, 753-767.
Adamian, L., Nanda, V., DeGrado, W.F. and Liang, J. (2005) Empirical lipid propensities of amino acid residues in multispan alpha helical membrane proteins, Proteins, 59, 496-509.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, Nucleic Acids Res, 25, 3389-3402.
Andersen, O.S. and Koeppe, R.E., 2nd (2007) Bilayer thickness and membrane protein function: an energetic perspective, Annu Rev Biophys Biomol Struct, 36, 107-130.
Arai, M., Mitsuke, H., Ikeda, M., Xia, J.X., Kikuchi, T., Satake, M. and Shimizu, T. (2004) ConPred II: a consensus prediction method for obtaining transmembrane topology models with high reliability, Nucleic Acids Res, 32, W390-393.
