透過您的圖書館登入
IP:18.225.255.134
  • 學位論文

整合蛋白質複合體與蛋白質交互作用資料於探討蛋白質複合體拓樸特性之研究

A study of protein complexes topological features by integrating protein complexes and protein-protein interaction data

指導教授 : 黃建宏
共同指導教授 : 吳家樂(Ka-Lok Ng)
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在蛋白質與蛋白質交互作用(之後以PPI來表示)網路中,互動越密集的區域,越可能是蛋白質複合體所在之處。 本論文中我們延伸以前近似完全圖(pseudo-clique)演算法,合併兩個大小不同的密集區域,以做為蛋白質複合體的候選區域[7]。 我們發現此方法可以比隨機的PPI版本找到更多蛋白質複合體,且在大小大於等於5以上時,Jaccard`s coefficient為0.2以上。 藉由整合蛋白質複合體資料跟PPI資訊,我們使用了BOND資料庫所提供的全部653個人類蛋白質複合體的子單元之間交互作用的拓樸。 我們定義兩種拓樸參數來檢視在PPI密集的區域是否發現蛋白質複合體。 第一個參數稱為交互作用的密度,這個參數為蛋白質複合體中的子單元之中根據實驗所得到的PPI總數目與最大可能的PPI(即完全圖)之間的比率。 第二個參數稱為子單元連通度,這個參數呈現了蛋白質複合體的子單元之間最大連通群的分支度。 結果顯示,交互作用密度超過90%以上的大約佔所有的人類蛋白質複合體18%,而交互作用密度分布的範圍從0%到90%皆有。 對於第二個參數,研究結果顯示人類的蛋白質複合體中子單元連通度超過90%以上的大約佔全體的27%,而連通度分布的範圍從0%到90%都有。 這兩個結果顯示出密度高的蛋白質複合體並未很多,推論最有可能是PPI的資料目前並不完整,以至於密度及連通度不高。 再者,我們也在人類蛋白質複合體中找出一些的子單元的集合,我們稱之為核心模組。 這些核心模組分別有從2到10個子單元的大小。我們計算出核心模組大小大於2以上並且重複出現(如兩次)的機率,根據估計基本上其出現機率幾乎為零。 為了進一步描述核心模組的特性,我們利用BOND資料庫中的GO生物功能資料,在核心模組之間以成對的方式用功能相似度來做比較。研究結果顯示重覆出現次數較多且較大的功能相似度,當中以核心模組大小為10時jaccard`s coefficient分數最高,表示核心模組大小為10時功能相似度最高,顯示這些核心模組很可能有重要的生物功能。

並列摘要


Interaction dense regions in the protein-protein interaction (PPI) network could possibly be identified as a protein complex. In this thesis we extended a pseudo-clique algorithm to merge two dense regions with different sizes [7]. It is found that this approach could predict more protein complexes, achieving a Jaccard`s coefficient of 0.2 for size equal or larger than five, comparing with the randomized PPI version. By integrating the protein complexes data, and PPI records, we study the interaction topology among the subunits for all the human protein complexes, a total of 653 protein complexes, provided by BOND. Two topological parameters are defined to test whether protein complex are found in PPI dense region or not. The first parameter is called the density of interaction, which describes the experimental recorded PPI among the subunits of a protein complex relative to the maximum possible PPI (i.e. clique). The second parameter is called the degree of connected subunits, which characterizes the largest connected cluster of subunits for a protein complex. Our results show that around 18% of the whole human protein complexes has a density of interaction over 90%, and the rest of the complexes account for density of interaction ranging from 0% to 90%. For the second parameter, our study shows that around 27% of the whole human protein complexes has a degree of connectivity over 90%, and has a range from 0% to 90% respectively. These two results indicate that the number of the protein complex with high density is not very much, we infer that this result has the great possibility due to the incompleteness of the PPI data. Furthermore, we identified sets of common subunits, so-called core module, for all the human protein complexes. These sets of core modules have a size of two to ten subunits. The probability of repeated occurrence (twice) of a core module with a size of larger than two is calculated. It is estimated that the probability of a core module occurs more than twice is zero nearly. To further characterize a core module, we did a pairwise functional comparison, using the BOND database, among the core module subunits, it is found that frequently occurred, larger size core module, tends to have a higher functional similarity, where a core module of size 10 has the highest Jaccard`s coefficient. It is suggested that these core modules could possibly have important biological functions.

參考文獻


[1]. Altaf-Ul-Amin M., Shinbo Y., Mihara K., Kurokawa K., Kanaya S., (2006). Development and implementation of an algorithm for detection of protein complexes in large interaction networks. BMC Bioinformatics 7, p 207.
[2]. Arnau V., Mars S., Mar?n I., (2005). Iterative cluster analysis of protein interaction data. Bioinformatics 21(3), p 364.
[3]. Bader G.D., (2003). An automated method for finding molecular complex in large protein interaction networks. BMC Bioinformatics 4, p 2.
[4]. Bader G.D., Betel D., Hogue C.W., (2003). BIND: the biomolecular interaction network database. Nucleic Acids Res 31(1), p 248.
[5]. Dijkstra T.G., (1959). The divide-and-conquer manifesto. Lecture notes in artificial intelligence 1, p 269.

延伸閱讀