透過您的圖書館登入
IP:18.225.57.49
  • 學位論文

Identification of Protein Complexes and Biological Regulation and Signal Networks Using Multiple Biological Databases and Microarray Data

利用多重生物資料庫與微陣列數據來辨認蛋白質複合體與生物調控與訊息網路

指導教授 : 蘇豐文

摘要


細胞在不同的環境因素影響下會產生出針對不同狀態的生物反應網路與途徑,生物網路提供了系統化的分析來了解細胞內調控與訊息傳遞機制。針對不同環境因素的生物調控與訊息傳遞網路至今仍然不明確,我們必須藉由生物晶片的基因表現與現有的生物資料庫輔助來推論。由於現今蛋白質網路與基因序列分析的知識已較為完善,這些資訊提供了更多的機會讓我們去建構生物網路。然而蛋白質網路是擷取自各種不同的狀況,因此有很多互動關係可能不是在某些特定的狀況中會產生,我們也很難針對特定的情況花費大量金錢與實驗來驗證蛋白質的互動關係,因此如何整合基因表現與蛋白質網路對特定的環境因素來建立出重要的生物網路是很重要的議題。目前雖然有很多的研究利用生物晶片上的基因表現差值與傳統分群法來確認在特定狀態下重要的基因,但是對其調控網路與訊息傳遞的機轉還是未能夠充分了解。因此過去的研究利用圖形理論的方式在龐大蛋白質網路中來找尋生物途徑,但在尋找生物反應途徑中其並沒有考慮到高依賴關聯性的基因表現與蛋白質複合體的參與。 本論文中著重在推論、分析與驗證有效反映基因表現的生物網路。首先,我們由生物晶片推導基因調控網路基於轉錄因子分析與條件獨立並結合網路上的生物資料庫與工具來自動擷取脫氧核糖核酸序列的啟動子去預測能夠調節基因表現的轉錄因子,接著再利用d-separate準則和條件獨立的觀念來重建調控網路。接下來,利用多物種之間的功能與序列相關蛋白質當作起始點,並利用超幾何分布來延伸預測可能的蛋白質複合體。最後,我們利用基因調控網路、蛋白質複合體與蛋白質網路藉由馬可夫覆蓋搜尋法來建構出特定狀況下的生物網路途徑。我們以酵母菌與人類攝護腺癌的資料來測試我們的方法。基於演化上的共同網路特徵,我們有效的預測正確的酵母菌蛋白質複合體。在酵母菌費洛蒙與細胞牆的訊息網路中,我們不僅把已經知道的主生物途徑完整的建構出來,更能提供相關蛋白質複合體與相關基因。在人類攝護腺癌的網路中,我們從71個癌症與41個正常基因表現建構出可能的生物網路,我們辨認出9個重要的調控因子來控制攝護腺癌的訊息網路,且此調控網路正確的對應到目前已知的攝護腺癌生物途徑與相關報導文獻資料庫也呈現一致。基於目前已確認攝護腺癌的致病基因下,我們比起之前網路建構的方法都具有較高的敏感性。我們所提出的方法有效的整合基因表現與蛋白質網路建構出生物調控與訊息傳遞網路也間接了解到攝護腺癌所產生的生物途徑,利於之後的醫學驗證與疾病治療。

並列摘要


Condition-relevant biological networks occurring under environmental conditions such as disease, stress and stimulus describe functional interactions among genes and proteins. These networks provide a systems-level view of the mechanisms of biological processes in the cell. The principal challenge is that biological networks under specific condition remain unknown and must be inferred from gene expression (mRNA levels) in microarray data. Due to the increase availability of the protein interaction and genomic analysis from the Internet, they provide the opportunity to identifying the significant biological networks instead of only dependent on gene expression. However, current protein–protein interaction networks do not provide information about the condition(s) under which the interactions occur. Although numerous studies used microarray analysis and traditional statistical and clustering methods with well-known pathway databases to identify the individual genes during the disease processes, the important gene regulations remain unclear and hard to detect the new pathways they are involved in. Some recent signal transduction pathway detection methods used the graph theory approach to identify the signal pathways from noisy protein networks. They did not focus on the high-order dependency relationship among genes and did not take biological insights such as protein complexes into consideration. This dissertation aims to develop computational approaches for inferring, analyzing and validating biological networks of genes corresponding to expression data and recent protein networks. We first describe a computational framework to reconstruct the gene regulatory network from the microarray data using biological knowledge and constraint-based inferences. We apply d-separate criteria and conditional independency to filter the links in the gene regulatory networks. The workflow integrates the bioinformatics toolkits and databases to automatically extract the promoter regions of DNA sequences to predict the transcription factors that regulate the gene expressions. Second, we propose protein complexes prediction method based on the conserved networks from the orthologous proteins across species as the initial seed graphs and applied cumulative hyper-geometric testing to greedily add the protein into the seed graph. Finally, we combine gene regulation and protein complexes to develop a novel method for identifying significant different signal transduction pathways using Markov blanket and A* heuristic search methods. The former takes into consideration the high-order dependency relationships among genes and the latter extracts genes with significant different gene expressions. In the experiments, we tested the methods by applying them to the two networks: yeast and human prostate cancer. According to the evolutionary conserved network structure, we efficiently predict the correct yeast protein complexes. In yeast pheromone and cell wall integrity signal network, we not only identify the main chain of those well-known networks but also realize the order of the functional modules involved in the networks. We adopt the microarray datasets consists of 71 primary tumors, 41 normal prostate tissues from Stanford Microarray Database (SMD) as a target dataset to evaluate our method. In biological regulation and signal networks, we identify 9 significant transcription factors between normal and cancer samples and the networks we extracted correctly map to the well-known prostate cancer-related pathways in KEGG database. The prediction denoted the androgen function, integrin signal, MAPK, WNT, immune, STAT/JAK and ubiquitin pathways may be involved in the development of the prostate cancer and the promotion of the cell death in cell cycle. Our approach is able to efficiently integrate microarray data and protein-protein interactions for the network identification and also understand the high-order dependency interactions and protein complexes. We are able to identify the genes and their networks related to prostate cancer that are validated by recent databases and published literature. Base on the prostate cancer-related genes databases, we got higher sensitivity than the previous methods. Those critical concepts of tumor progression from network-based analysis are useful to understand cancer biology and disease treatment.

參考文獻


[1] Aalinkeel R, Nair MP, Sufrin G, Mahajan SD, Chadha KC, Chawda RP, and Schwartz SA: Gene expression of angiogenic factors correlates with metastatic potential of prostate cancer cells, Cancer Research, 2004, 64(15):5311-5321.
[2] Abbeel P, Koller D, and Ng AY: Learning factor graphs in polynomial time sample complexity, Journal of Machine Learning Research, 2006, 7:1743-1788.
[4] Adamcsek B, Palla G., Farkas I, Derenyi I, and Vicsek T: CFinder: locating cliques and overlapping modules in biological networks, Bioinformatics 2006, 22(8):1021-1023.
[5] Agoulnik IU, Vaid A, Bingman WE, Erdeme H, Frolov A, Smith CL, Ayala G, Ittmann MM, and Weigel NL: Role of SRC-1 in the promotion of prostate cancer cell growth and tumor progression, Cancer Research, 2005,65(17):7959-7967.
[6] Akutsu T, Miyano S, and Kuhara S: Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function, In Proceedings of the fourth annual international conference on Computational molecular biology, New York, NY, USA, 2000:8-14.

延伸閱讀