透過您的圖書館登入
IP:18.191.218.164
  • 學位論文

利用基因交互作用建立基因差異網路

Utilizing Gene-gene Interaction for Construction of Differential Network

指導教授 : 蕭朱杏
共同指導教授 : 盧子彬(Tzu-Pin Lu)

摘要


基因差異網路分析(differential network analysis)可用於了解基因之間相關性的改變對於複雜疾病的影響。疾病與基因的關聯性研究,過去多專注於找尋基因表現量差異(differential gene expression)的生物標記(biomarker),然而隨著生物路徑(biological pathway)與基因網路(genetic network)近年來被廣泛討論,複雜的生物機制、基因與基因之間的相關性,不論是相互抑制或是活化,都有可能造成疾病的發生;而利用基因差異網路分析能不再只針對單一個基因,而可以同時將基因之間的相關性納入考量,並且同時利用條件機率考慮到網路內全體基因。 大多數研究對於基因差異網路的建立,都需要先針對不同組別各自建立出基因網路,然後再藉由不同的統計方法,如假設檢定等,探討兩個組別顯著差異的網路連線。例如,先利用條件相關性質建立每個組別的基因網路,再利用兩個組別的條件相關係數之差異,來作為建立基因差異網路的方法。這樣的方法至少需要兩個步驟,可能花費較多計算時間;而且,建立兩個組別各自的基因網路將導致需要被估計的參數個數變得龐大。 本研究提出的方法希望能同時將兩個組別的基因資訊一同進行模型估計,使得所需要估計的參數個數較少、使用較少的模型假設、達到較快的運算速度。本研究的方式為,利用「條件相關」、「基因差異網路」以及「基因交互作用」三者,來建立基因差異網路,希望透過基因之間交互作用在不同組別間的不同,利用羅吉斯迴歸模型,來建立基因差異網路。同時,本研究也將闡明,不論是條件相關差異或是迴歸係數估計皆能得到相似的基因差異網路。 本研究透過模擬比較羅吉斯迴歸模型與其他建立差異網路的方法,例如:DINGO、INDEED、JDINAC、以及其他變數選擇的統計方法等,結果顯示羅吉斯迴歸模型在特異度(specificity)、準確度和F1-score均有很好的表現。同時,本研究也將此方法應用在兩筆不同的資料,分別為美國癌症基因體圖譜計畫(The Cancer Genome Atlas, TCGA)資料庫中子宮頸癌資料,以及台灣人體生物資料庫(Taiwan biobank)中三酸甘油酯資料。在這些實證資料分析中,藉由基因差異網路的建立,可以了解基因之間相互作用的改變對於疾病的影響。在本研究建立的差異網路中,STAT1與AKT3、MYC與RAF等基因交互作用對於卵巢癌有顯著影響,而在三酸甘油酯資料則發現SNX27與TUFT1、TUFT1與TFIP等基因交互作用,未來或許可以進一步探討其分子層次的功能與對於疾病的影響。

並列摘要


Differential network analysis has become an important topic in recent years. Previous research has focused on differential gene expression analysis to identify biomarkers for diseases. However, since genes are known to connect and interact with each other in biological pathways and genetic networks, the comparison of such relationship in gene networks from different response groups has received much attention. The analysis of differential gene network may discover the difference in network structures and help to detect the difference in association between different response conditions. Existing methods usually start by first estimating the genetic network for each group, and then identify the differences between these group-specific genetic networks to construct a differential network. For example, the group-specific network can be represented by precision matrix and the element-wise subtraction of two precision matrices can be viewed as the strength of differential edges in a differential network. Alternative to the precision matrix, some researchers consider the partial correlation matrix; while some prefer the Pearson’s correlation matrix as the group-specific network. To examine if two networks are the same, statistical evidence through a hypothesis testing is often adopted. The result of the hypothesis testing may imply if the edge exists in a differential network. These methods, however, often take a great deal of computational time and need to estimate many parameters. In this research, instead of estimating two group-specific networks separately, we combine the genetic information of two groups to build a differential network simultaneously. We try to use conditional correlation, differential network, and gene-gene interaction to construct differential network. Considering a binary response, we propose a logistic regression model containing both main effect and gene-gene interaction effect and use the interaction term to represent the edge in a differential network. Simulation studies are performed to compare the performance with several existing methods, including DINGO, INDEED, and JDINAC, for differential network construction. The simulation results show that the logistic regression approach performs well in specificity, accuracy and F1-score. In addition, we apply the proposed analysis to construct the differential network based on RNA array gene expression from ovarian cancer patients and SNP data from Taiwan Biobank with Triglyceride measurement as the response. In the resulting differential network, some gene-gene interaction terms are identified significant. For the ovarian cancer study, the STAT1 and AKT3, MYC and RAF, seem to show differential interaction; for the Triglyceride measurement, SNX27 and TUFT1, TUFT1 and TFIP provide significant interaction. These findings may provide new directions for novel relationship between biological function and diseases.

參考文獻


1.Enroth S, Johansson Å, Enroth SB, Gyllensten U: Strong effects of genetic and lifestyle factors on biomarker variation and use of personalized cutoffs. Nature Communications 2014, 5(1):1-11.
2.Bush WS, Moore JH: Genome-wide association studies. PLoS Computational Biology 2012, 8(12):e1002822.
3.Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J: 10 years of GWAS discovery: biology, function, and translation. The American Journal of Human Genetics 2017, 101(1):5-22.
4.Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND, Betel D: Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biology 2013, 14(9):1-13.
5.Gill R, Datta S, Datta S: A statistical framework for differential network analysis from microarray data. BMC Bioinformatics 2010, 11(1):1-10.

延伸閱讀