透過您的圖書館登入
IP:3.149.213.209
  • 學位論文

建立一個基於機器學習方法的蛋白質交互作用預測系統

Building a Protein-Protein Interaction Prediction System based on Machine Learning Methods

指導教授 : 邱泓文

摘要


簡介: 「蛋白質交互作用」近年來一直是生物學研究中十分重要的環節之一,與生命維持的各項反應息息相關,尤其近幾年大量產生蛋白質交互作用資訊的方法進步的催化下,更使得蛋白質研究領域的研究者有了十分充裕的資訊及輔助工具,在資訊量這麼龐大的現在,運用電腦運算技術能否預測蛋白質交互作用成了最重要的課題。在每一個蛋白質上與其它蛋白質產生反應的區域統稱功能區域 (Functional region,意指Domain、Motif……等),這些功能區域扮演著在交互作用上重要的角色,這些「功能區域─功能區域」的組合直接或間接導致了蛋白質的交互作用。電腦運算的資料探勘技術已廣泛運用於生物資訊領域,這項技術的應用將可讓我們瞭解生物反應裡各元件彼此的關聯性,曾有團隊以關聯法則探勘進行蛋白質交互作用的預測,但是在支持度以及信心值並沒有一個較良好的取捨,也沒有做驗證的步驟。所以,本研究的目的在於應用機器學習方法來建立一個蛋白質交互作用的預測系統。 材料與方法: 本研究所用的蛋白質交互作用資料由DIP、IntAct及BIND等網站收集而來,並參照NCBI等網站之資料將之整合成為同一個資料集並且建置資料庫,本研究需要收集UNIPROT的功能區域的資料集並與蛋白質的資料表中相關聯連的蛋白質紀錄結合,紀錄表單於資料庫中。並篩選出可能使得蛋白質產生反應的Motif Pairs (「功能區域─功能區域」的組合)名單,運用此名單去預測一對蛋白質是否會產生交互作用。本研究應用關聯法則探勘(Association Rule Mining),使用三種物種的蛋白質交互作用資料來進行關係模組的產生,結果的預測模組運用其他物種的蛋白質交互作用進行驗證,並也將與其他預測工具(如InterDom)進行結果比對,依此建立ㄧ個蛋白質交互作用預測系統。 研究結果: 建立ㄧ蛋白質交互作用之預測模組,建立ㄧ個網頁介面的系統,未來上線後將提供如蛋白質整合資訊查詢、蛋白質交互作用查詢、蛋白質功能域及序列比對、本研究產生規則及其符合的蛋白質列表和蛋白質交互作用預測等功能,相關研究可運用本系統取得整合式的蛋白質交互作用資訊,對照提供的功能域資訊,可以輕易看出功能域與交互作用之間其中的關聯性,而預測模組也可提供研究的新方向及參考。

並列摘要


INTRODUCTION Protein-protein interaction (PPI) is an emerging field in biological research and plays an important role in life process. If PPI prediction can be achieved, scientists will know biological processes and disease mechanisms better. Recently many PPI-related databases were produced. Besides, computational methods were applied to predict PPIs. Because functional regions, e.g. domains, motifs, are key components on whether one protein interact with another protein, several researches had attempted to use data mining methods to show the relationship of functional regions of proteins in PPIs without validation. MATERIALS AND METHODS In this study, PPI data were collected from DIP, IntAct and BIND, and the information of functional regions was downloaded from UNIPROT. These data were integrated into one database and its query interface was designed to present protein-protein interaction data including functional regions and sequences. This module for PPIs prediction based on an association rules mining was developed with three sets of PPI data and the PPIs in other species are used to evaluate our PPI prediction module. These rules were compared with the result of InterDom. Finally a system for PPI prediction was constructed with the module. RESULT A PPI prediction module was produced for a web-based system. The system will support queries for integrated protein information, protein-protein interaction information, the comparison between functional regions and sequences of proteins. Besides, the system can show those rules matched PPIs and those PPIs matched rules and give a PPI prediction function. Other related researches will be able to get integrated protein-protein interaction information and compare the functional regions by our system. The results of prediction will provide new references.

參考文獻


李政儒,〈蛋白質功能域相互作用的預測〉,亞洲大學生物資訊研究所,碩士論文,民95.1.
Bader GD. and Hogue CWV., “An automated method for finding molecular complexes in large protein interaction networks”, BMC Bioinformatics, 2003, Vol. 4(1):2.
Cesareni G., Ceol A., Gavrila C. et al., “Comparative interactomics”, FEBS Letters, 2005, Vol. 579, pp. 1828– 1833.
Deng MH., Mehta S., Sun FZ. et al., “Inferring Domain-Domain Interactions From Protein-Protein Interactions”, Genome Research, 2002, Vol. 10, pp. 1540- 1548.
Han DS., Kim HS., Seo JM. et al., “A Domain Combination Based Probabilistic Framework for Protein-Protein Interaction Prediction”, Genome Informatics, 2003, Vol. 14, pp. 250-259.

延伸閱讀