使用蛋白質表面三度空間的交互作用原子機率分布以預測蛋白質-蛋白質交互作用區域

蛋白質-蛋白質交互作用是很多生物程序的關鍵。用來預測蛋白質-蛋白質交互作用區域的計算方法論是相當重要的工具，能夠提供對於蛋白質功能的深入瞭解、以及發展針對於蛋白質-蛋白質交互區域的治療方法。蛋白質-蛋白質交互區域的一項共通特徵是兩個蛋白質交互作用的表面有互補性，類似蛋白質內部的堆積密度及氨基酸組成的物理化學特性。在此研究中，我們在蛋白質表面建構非共價鍵交互作用原子的三度空間機率密度地圖以模擬物理化學性質的互補性。交互作用原子的機率是從蛋白質內部統計而來，機器學習方法則被應用於學習蛋白質-蛋白質交互作用區域上機率密度地圖的特徵模式。經過訓練的預測機使用一組學習案例(包含432條蛋白質)作為交互驗證之用，並且使用獨立的資料組(包含142條蛋白質)作測試。獨立測試結果中，以氨基酸為單位的馬修斯相關係數為0.423，正確率、精準度、靈敏度、特異性分別為0.753、0.519、0.677以及 0.779。量測的結果顯示我們最佳化的機器學習模型是現今最準確的預測機之一。當蛋白質-蛋白質交互作用區域變大以及當此區域的氨基酸組成擁有更多疏水性時，預測準確率會提高; 而蛋白質交互作用區域的核心較有可能被給予高預測信心值。我們的結果表示蛋白質表面的物理化學互補性質是決定蛋白質-蛋白質交互作用的重要因素，而使用蛋白質內部擷取的非共價鍵交互作用資料所產生出的物理化學互補性特徵，能夠準確地預測出相當大比例的蛋白質-蛋白質交互作用區域。

關鍵字

蛋白質交互作用；機率密度分布；機器學習；蛋白質交互作用區域預測

並列摘要

Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.

並列關鍵字

protein-protein interaction ； machine learning ； probability density map ； PPI site prediction

參考文獻

1. McConkey, B.J., V. Sobolev, and M. Edelman, Discrimination of native protein structures using atom-atom contact scoring. Proc Natl Acad Sci U S A, 2003. 100(6): p. 3215-20.

2. Chakrabarti, P. and J. Janin, Dissecting protein-protein recognition sites. Proteins, 2002. 47(3): p. 334-43.

3. Lo Conte, L., C. Chothia, and J. Janin, The atomic structure of protein-protein recognition sites. J Mol Biol, 1999. 285(5): p. 2177-98.

4. Levy, E.D., A simple definition of structural regions in proteins and its use in analyzing interface evolution. J Mol Biol, 2010. 403(4): p. 660-70.

5. Glaser, F., et al., Residue frequencies and pairing preferences at protein-protein interfaces. Proteins, 2001. 43(2): p. 89-102.

國際替代計量

使用蛋白質表面三度空間的交互作用原子機率分布以預測蛋白質-蛋白質交互作用區域

全文下載

主題瀏覽