透過您的圖書館登入
IP:3.144.19.147
  • 學位論文

本質相關係數套件cidr 及其應用 - 以找尋阿拉伯芥非生物逆境專一之基因群為例

cidr: A Package of Coefficient of Intrinsic Dependence (CID) and its Application of Finding the Abiotic Stress-specific Gene Modules in Arabidopsis

指導教授 : 劉力瑜

摘要


在建立模型或決策時,找出關鍵特徵變數是很重要的一環。本質相關係數 (coefficient of intrinsic dependence, CID) 是一個關聯統計量,可以用來度量變數間的關聯性。它在一些找尋相關性的應用上有很好的表現,例如用來建立基因調控網路或是測量兩組變數的關聯性。為了更方便給其他人使用,例如生物學家,在本研究中,開發一個R套件(cidr),讓大家可以更容易且方便的進行本質相關係數計算。本研究也整合加權基因共表現網絡分析 (weighted gene co-expression network analysis, WGCNA) 與本質相關係數 (CID),應用在找尋阿拉伯芥非生物逆境專一 (abiotic stress-specific) 之基因群,並利用熱圖 (heatmap) 與雙軸圖 (biplot) 來進行視覺化呈現。在低溫、高溫、鹽害逆境下,分別找到2個低溫、3個高溫、5個鹽害逆境專一基因群,提供了解生物交互影響過程的初步參考。此外,我們應用了子本質相關係數 (subCID) 更詳細的在基因群中找尋逆境專一基因,透過各基因 subCID 數值矩陣製作雙軸圖,有助於區分出各個逆境專一的基因。希望本論文開發之 cidr套件以及論文中描述的方法,有助於揭示隱藏在大規模基因體數據中的相關生物機制。

並列摘要


Feature selection plays an important rule for modeling or decision making. The coefficient of intrinsic dependence (CID) is an association measure which can be used to measure the relationship among the variables. It had been applied to construct gene regulatory networks and to measure the relationships between two groups of one- or multiple-dimensional variables. For the convenience of potential users to obtain the CID values, we had developed an R package, cidr, for the computation and the visualization of CID. In Chapter 3, we also incorporated the weighted gene co-expression network analysis (WGCNA) and CID to find the abiotic stress-specific gene module in Arabidopsis and the results had be summarized using the heatmaps and biplots. Two cold stress-specific, three heat stress-specific, and five salt stress-specific gene modules were identified, respectively. The results may provide hints about the underlying biological processes. In Chapter 4, we further adopted the subCID values to identify the stress-specific genes in a gene module. The biplot derived from the subCID matrix assisted to visualize the stress-specific genes. In conclusion, we hope the cidr package as well as the methodologies described in the dissertation can assist to reveal the biological insights hidden in massive genomic-level datasets.

參考文獻


Liu L-YD, Chang L-Y, Kuo W-H, et al (2012) In silico prediction for regulation of transcription factors on their shared target genes indicates relevant clinical implications in a breast cancer population. Cancer Inform 11:113.
Albert R (2005) Scale-free networks in cell biology. J Cell Sci 118:4947.
Atkinson NJ, Urwin PE (2012) The interaction of plant biotic and abiotic stresses: from genes to the field. J Exp Bot 63:3523.
Bolstad BM, Irizarry RA, Åstrand M, Speed TP (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185.
Borate BR, Chesler EJ, Langston MA, et al (2009) Comparison of threshold selection methods for microarray gene co-expression matrices. BMC Res Notes 2:240.

延伸閱讀