透過您的圖書館登入
IP:18.222.123.173
  • 學位論文

利用相關性矩陣降維進行雙分群分析:以基因表現資料為例

A biclustering method with correlation matrix for gene expression profiling

指導教授 : 蕭朱杏
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


雙分群分析方法近年來在統計學上是相當重要的分析工具,特別是在歸類哪些基因在某些特定實驗下會有相似的基因表現。雙分群分析目標是找出哪些基因在一個特定實驗集合下的基因表現會有相同表現趨勢。先前研究大多是類別分析方法的推廣,集中研究於基因在所有的實驗條件之間的相似性。在本篇論文當中我們提出利用基因表達之間的相關性矩陣以及實驗條件之間的相關性矩陣降維進行雙分群分析,簡稱BiCor。利用這兩個相關性矩陣,每次的迭代運算都會刪除最不相關的基因或實驗條件。根據預先指定的收斂條件,結果會得到較小的矩形陣列,此矩形陣列裡的基因表現從基因角度以及實驗條件角度看來都有相似的趨勢。我們更進一步定義真實偵測率(TDR)與成功被偵測率(DTR)用來評估BiCor的表現。最後利用模擬試驗與實際資料進行分析,比較BiCor和其他現有雙分群分析方法優劣。

關鍵字

雙分群 相關性 基因表現

並列摘要


Biclustering has become an important analytical tool in recent statistical practice, particularly when it is of interest to group genes under certain experimental conditions. The goal of such biclustering analysis is to identify sets of genes sharing similar expression patterns across subsets of samples. Previous developed approaches were mostly extensions of clustering methods and thus focused more on similarity between genes across all experimental conditions. Here we proposed a bicluster algorithm via correlation matrices, called BiCor, between gene expression patterns and between conditions. Each of these two matrices was visited iteratively to remove the most irrelevant genes or conditions. Under a pre-specified convergence criterion, the resulting smaller rectangular contains expression levels that are considered similar at both the gene and the condition level. We further defined the true discovery rate (TDR) and discovered true rate (DTR) to assess the performance of the proposed algorithm. Simulation studies and applications were conducted to evaluate and compare the proposed BiCor with other existing algorithms.

並列關鍵字

Bicluster correlation gene expression

參考文獻


1. Anindya Bhattacharya and Rajat K. De. 2009. Bi-correlation clustering algorithm for determining a set of co-regulated genes. Bioinformatics, 25:21,2795-2801
2. Yizong Cheng and George M. Church. 2000. Biclustering of Expression Data. In Book Biclustering of Expression Data. 93–103
3. Ihmels J and Friedlander G et al. 2002. Revealing modular organization in the yeast transcriptional network. Nature Genetics, 31, 370–377
4. Ihmels J and Sven Bergmann et al. 2004. Defining transcription modules using large-scale gene expression data. Bioinformatics, 20, 1993–2003
5. Li Li and Yang Guo et al. 2012. A comparison and evaluation of five biclustering algorithms by quantifying goodness of biclusters for gene expression data. BioData Mining 5:8

延伸閱讀