透過您的圖書館登入
IP:18.118.144.69
  • 學位論文

淨本質相關係數在基因選擇與基因調控網路建構之應用

Gene Selection and Regulatory Network Construction with Partial Coefficient of Intrinsic Dependence

指導教授 : 劉力瑜
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


在隨機變數沒有分佈或函數的假設前提之下,本質相關係數依然能夠決定變數間的關係。當計算越多個預測變數與一個目標變數之間的本質相關係數,其數值會越大。這意味著如果存在與目標變數最相關的預測變數且本質相關係數是顯著的,即使再加入其他與目標變數相關性弱的預測變數,其本質相關係數仍然會是顯著的。 在這篇研究當中,我們提出了淨本質相關係數這個方法一步一步地選擇與目標變數相關的預測變數。而且,我們將淨本質相關係數這個方法應用在逐步變數選擇與建構基因調控網路。關於逐步變數選擇的應用,結合本質相關係數與淨本質相關係數這兩個方法可以消除其他相關變數的干擾。從模擬的結果當中,可以觀察到我們所提出的方法比使用結合了皮爾森相關係數與淨相關係數的方法更能具體地發現變數間曲線與直線的關係。根據結合本質相關係數與淨本質相關係數這兩個方法的數值結果,上述的特性提供了指示不同曲線關係程度的機會。在使用公開取得的資料庫之試驗結果中,結合本質相關係數與淨本質相關係數這兩個方法的逐步變數選擇程序能夠成功地鑑別出與三個低溫誘導因子相關的低溫反應基因,並且能有效地辨別樣本相關基因之間的相互作用。因此,我們所提出的策略可能有益於整合分析,並從雜訊中鑑別出相關性的形式。 另一方面,關於建構基因調控網路的策略,使用結合本質相關係數與淨本質相關係數這兩個方法可以在消除被選擇之相關節點的干擾之下,逐步選擇出目標節點與相對應的起始節點。由於本質相關係數與淨本質相關係數的數值具有不對稱性。所以我們利用此特性去區別出兩個節點之間的方向性。這個研究進行了虛擬的基因網路,以評估在重複100次不同樣本大小的網路之下使用結合本質相關係數與淨本質相關係數這兩個方法的啟發式演算法之表現。我們可以觀察到當樣本數增加時,重建的基因網路其正確性也會增加。另外將我們提出的策略應用在兩種不同的微陣列資料庫。其中一個是應用在阿拉伯芥中已知的低溫訊息傳遞路徑,此路徑是經由低溫誘導因子去誘發低溫相關基因(COR),我們提出的策略能夠成功地找出低溫誘導因子與低溫相關基因之間的連結。另一個資料庫是關於稻米中的鹼性-螺旋-環-螺旋家族,在生物學上還未發現它們的基因網路。因此,運用我們提出的策略建構出一個基因調控網路,可以給生物學家一些參考資訊。 綜合上述,結合本質相關係數與淨本質相關係數這兩個方法能夠有效地鑑別出擁有不同型態關係的相關變數。除此之外,具有不對稱性的本質相關係數與淨本質相關係數可以從統計學的觀點辨別變數間的方向性。因此,根據本質相關係數與淨本質相關係數這兩個方法所得到的變數選擇與建構基因調控網路結果,可以讓生物學家在實驗進行之前當作參考的依據。

並列摘要


The coefficient of intrinsic dependence (CID) is capable of determining associations among variables without making distributional or functional assumptions regarding to random variables. The CID value of the target variable would increase when more predictor variables include. This implies that a CID value of the target variable given multiple predictors is significant as the most relevant predictor is included even though the other predictors have weak association with the target variable. In this study, we developed the partial coefficient of intrinsic dependence (pCID) to facilitate the step-by-step selection of variables that are relevant to a target variable. Furthermore, we applied pCID method to stepwise variable selection and the construction of gene regulatory network. In stepwise variable selection, the strategy of selecting relevant variables using the CID along with the pCID can eliminate interference from other relevant variables. From simulation results, we observed that the proposed method is more sensitive to curvilinearity and more specific to linearity than the combination of Pearson’s correlation coefficient and the partial correlation coefficient (PCC/pPCC). This property may provide the opportunity to index different levels of curvilinearity according to CID/pCID outcomes. While being exercised on publicly available microarray data, the CID/pCID procedure successfully identified cold-responsive genes related to three C-repeat binding factors, and was especially effective at identifying some sample-specific gene-gene interactions. Therefore, the proposed strategy may be beneficial in meta analysis to distinguish general forms of relationships from the noise. On the other hand, the strategy of constructing the gene regulatory network using the CID/pCID can stepwise choose the target node and decide the corresponding source node while eliminating the influence of the other relevant nodes. Because of the asymmetric CID/pCID values, we used this property to discriminate the direction of two nodes. Pseudo network was conducted to evaluate the performance of the heuristic approach by CID/pCID from one hundred replications with different sample sizes. As the sample size increased, the accuracy of the reconstructive pseudo network would increase. Furthermore, the proposed approach was applied to two microarray datasets. One was the known cold signaling pathway, C-repeat binding factors would induce a set of cold-regulated (COR) genes in Arabidopsis. The CID/pCID approach could successfully discover the connection between C-repeat binding factor and cold-regulated gene. The other dataset was about the basic helix-loop-helix gene family in rice, which network was undiscovered in biology. We constructed the network based on the CID/pCID outcomes to provide the suggestion for biologists. In summary, the CID/pCID method could efficiently identify the relevant variables which had various types of the association. Besides, the asymmetric CID/pCID values were used to distinguish the direction of two variables from the statistical viewpoints. Therefore, the statistical outcomes of the variable selection and gene regulated network construction based on the CID/pCID method could provide references for biologists before making an experiment on plants.

參考文獻


Liu, L.Y.D., Chang, L.Y., Kuo, W.H., Hwa, H.L., Shyu, M.K., Chang, K.J., and Hsieh, F.J. (2012), "In silico prediction for regulation of transcription factors on their shared target genes indicates relevant clinical implications in a breast cancer population." Cancer Informatics, 11: 113-137.
Abe, H., Yamaguchi-Shinozaki, K., Urao, T., lwasaki, T., Hosokawa, D., and Shinozaki, K. (1997), "Role of Arabidopsis MYC and MYB homologs in drought- and abscisic acid-regulated gene expression." The Plant Cell, 9: 1859-1868.
Akhtar, M., Jaiswal, A., Taj, G., Jaiswal, J. P., Qureshi, M. I., and Singh, N. K. (2012), "DREB1/CBF transcription factors: their structure, function and role in abiotic stress tolerance in plants." Journal of Genetics, 91: 385-395.
Baba, K., Shibata, R., and Sibuya, M. (2004), "Partial correlation and conditional correlation as measures of conditional independence." Australian and New Zealand Journal of Statistics, 46(4): 657-664.
Buck, M.J., and Atchley, W.R. (2003), "Phylogenetic analysis of plant basic helix-loop-helix proteins." Journal of Molecular Evolution, 56: 742-750.

延伸閱讀