在建立模型或決策時,找出關鍵特徵變數是很重要的一環。本質相關係數 (coefficient of intrinsic dependence, CID) 是一個關聯統計量,可以用來度量變數間的關聯性。它在一些找尋相關性的應用上有很好的表現,例如用來建立基因調控網路或是測量兩組變數的關聯性。為了更方便給其他人使用,例如生物學家,在本研究中,開發一個R套件(cidr),讓大家可以更容易且方便的進行本質相關係數計算。本研究也整合加權基因共表現網絡分析 (weighted gene co-expression network analysis, WGCNA) 與本質相關係數 (CID),應用在找尋阿拉伯芥非生物逆境專一 (abiotic stress-specific) 之基因群,並利用熱圖 (heatmap) 與雙軸圖 (biplot) 來進行視覺化呈現。在低溫、高溫、鹽害逆境下,分別找到2個低溫、3個高溫、5個鹽害逆境專一基因群,提供了解生物交互影響過程的初步參考。此外,我們應用了子本質相關係數 (subCID) 更詳細的在基因群中找尋逆境專一基因,透過各基因 subCID 數值矩陣製作雙軸圖,有助於區分出各個逆境專一的基因。希望本論文開發之 cidr套件以及論文中描述的方法,有助於揭示隱藏在大規模基因體數據中的相關生物機制。
Feature selection plays an important rule for modeling or decision making. The coefficient of intrinsic dependence (CID) is an association measure which can be used to measure the relationship among the variables. It had been applied to construct gene regulatory networks and to measure the relationships between two groups of one- or multiple-dimensional variables. For the convenience of potential users to obtain the CID values, we had developed an R package, cidr, for the computation and the visualization of CID. In Chapter 3, we also incorporated the weighted gene co-expression network analysis (WGCNA) and CID to find the abiotic stress-specific gene module in Arabidopsis and the results had be summarized using the heatmaps and biplots. Two cold stress-specific, three heat stress-specific, and five salt stress-specific gene modules were identified, respectively. The results may provide hints about the underlying biological processes. In Chapter 4, we further adopted the subCID values to identify the stress-specific genes in a gene module. The biplot derived from the subCID matrix assisted to visualize the stress-specific genes. In conclusion, we hope the cidr package as well as the methodologies described in the dissertation can assist to reveal the biological insights hidden in massive genomic-level datasets.