基於不同相似尺度之多元整合式分群法於基因表現資料的群集分析

微陣列資料群集分析的目的是為了找出在不同的實驗條件之下具有相似功能的基因表現。不同的相似尺度之下, 與使用不同的群集分析方法皆可導致不同的分群結果。本研究中,我們使用Pearson、Kendall、Spearman 三種不同的相關係數以及歐式距離尺度, 分別運行階層分群樹(HCT)、K均值(K-means)、分割環繞物件法(PAM)、一致性分群法(Consensus clustering) 與整合式分群法(Ensemble clustering) 。我們整合這些群集結果, 得到資料最後的分群, 期望得到較穩定的分群結果, 我們將以一組模擬資料與一組微陣列基因資料來說明與討論我們所提的方法。

關鍵字

群集分析；相關係數；整合式分群；相似尺度；階層式分群法； K 均值法；分割環繞物件法；一致性分群法

並列摘要

Unsupervised clustering methods have been widely applied to the analysis of gene expression data to identify biologically relevant groups of genes. Using different clustering algorithms with various similarity measures usually results in quite different gene clusters. To lessen these effects, we propose a new clustering method by integrating various clustering algorithms based on three similarity measures. The proposed method, which we called the multiple ensemble clustering, averages the consensus results from the hierarchical clustering, the K-means, and the partitioning around medoids based on the Pearson rho, Kendall tau, and Spearman rank correlations. We use a simulated and a real data set to illustrate the proposed method. The validity indices indicate that the multiple ensemble clustering provide a much more stable clustering result.

並列關鍵字

clustering ； consensus clustering ； ensemble clustering ； gene expression ； hierarchical clustering tree ； K-means ； partitioning around medoids ； similarity measures

參考文獻

microarray data. Genetical Research, 77:123-8.

Balasubramaniyan, R, Hullermeier, E, Weskamp, N, Kamper, J. 2005. Clustering

of gene expression data using a local shape-based similarity measure. Bioinformatics,

clustering method (H-K-means) for microarray analysis. CSB Workshops,

Dunn, J.C., 1974. Well separated clusters and fuzzy partitions. Journal on Cybernetics,

國際替代計量

基於不同相似尺度之多元整合式分群法於基因表現資料的群集分析

全文下載

主題瀏覽