透過您的圖書館登入
IP:18.218.168.16
  • 學位論文

應用支援向量機在人類蛋白質交互作用的預測

Prediction of Human Protein-Protein Interactions Using Support Vector Machines

指導教授 : 高成炎

摘要


近年來,透過使用高效能產出的酵母菌雙雜交(yeast-two hybrid)分析方法產生大量的蛋白質交互作用的資料。透過這些資料的取得,以及其他蛋白質特徵值,使得運用計算方法預測人類同源蛋白質交互作用(Interolog)已變得是可行的方法。因此整合異質性資料,並且提高預測人類蛋白質交互作用的準確度,是生物資訊的方法中最需要的。 在以知識為基礎(knowledge-based)的研究當中,我們提出在蛋白質交互網路中尋找最大相似完全圖,來計算物種間相對的保留性,並且使用其他蛋白質特徵值給予計分。這些所預測的人類同源蛋白質交互作用主要透過6個物種,包含有大鼠、小鼠、果蠅、線蟲、阿拉伯芥及酵母菌。使用功能性關鍵詞(functional keyword)及基因本體(Gene Ontology)作為評估,結果也顯示出所預測的蛋白質交互作用有較高的可信度。與其他同源蛋白質交互作用為基礎的方法比較中,所提出的方法也有較高的準確度。 本研究考慮了蛋白質交互作用的特徵值,包含有同源蛋白質交互作用,空間特性(細胞胞器位置及組織特異性),時間特性(細胞周期),功能區塊配對組合。透過這6維度特徵值以及組合氨基酸疏水性、帶電性、分子體積大小,構成3組16維度特徵值,建立多個委員制模型(committee model)的支援向量機(SVM)。最後使用10組不同大小的測試資料,且在5重交互驗證測試中也能獲得90%以上的準確度。並且,分析比較的結果也顯示我們所提出的方法,比其他以支援向量機為基礎的方法,有較高的準確度。

並列摘要


The recent increase in the use of high-throughput two-hybrid analysis has generated a large amount of data on protein interactions. Specifically, the availability of information about experimental protein-protein interactions and other protein features on the Internet enables human protein-protein interactions to be computationally predicted from co-evolution events (interolog). Computational methods must be developed to integrate these heterogeneous biological data to facilitate the maximum accuracy of the human protein interaction prediction. In knowledge-based study, we proposes a relative conservation score by identifying maximal quasi-cliques in protein interaction networks, and addressing of other interaction features to formulate a scoring method. The scoring method can be adopted to discover which protein pairs are the most likely to interact in multiple protein pairs. The predicted human protein-protein interactions associated with confidence scores are derived from six eukaryotic organisms - rat, mouse, fly, worm, thale cress and baker's yeast. The evaluation of our proposed method using functional keyword and gene ontology annotations indicates that some confidence is justified in the accuracy of the predicted interactions. Comparisons among existing methods also reveal that the proposed method predicts human protein-protein interactions more accurately than other interolog-based methods. This study considers protein interaction features, including interolog, spatial proximity (sub-cellular localization, tissue-specificity), temporal synchronicity (the cell-cycle stage), and domain-domain pair combinations. Using these $6$ protein features, and combination of hydrophobic, charge, and volume amino acid property as $3$ sets of $16$-dimension features to construct committee models of support vector machines (SVMs). The final $5$-fold cross validation testing for $10$ different size test sets revealed that the accuracy of test set can be obtained above 90\%. Moreover, the analytical comparisons also suggested our proposed method have higher accuracy than other SVM-based methods.

並列關鍵字

support vector machine SVM protein interaction PPI interolog hydrophobic charge volume

參考文獻


Universal Protein knowledgebase. Nucleic Acids Res, 32(Database
[4] Yael Artzy-Randrup, Sarel J Fleishman, Nir Ben-Tal, and Lewi Stone.
Comment on "Network motifs: simple building blocks of complex networks"
and "Superfamilies of evolved and designed networks". Science,
305(5687):1107; author reply 1107, Aug 2004.

延伸閱讀