透過您的圖書館登入
IP:3.131.13.194
  • 學位論文

利用建模方式發展一針對時間序列之群集演算法

Using Dynamic Template Based Clustering to Analyze Time Series Microarray Data

指導教授 : 莊曜宇

摘要


生物晶片是一使用在大量偵測基因表現量的方法。利用生物晶片的實驗設計主要分成兩大類,一為比較基因在兩種不一樣情況下的表現,另一者為觀看基因在時間變化上的表現。為了分析生物晶片所得到的資料,數種分群方式應運而生。利用分群的方式可以尋找類似表現的晶片或者類似表現的基因,藉由這種方法,找出可能為同種型態的樣本或者互相影響的基因。傳統的分群方法並沒有針對時間序列進行設計,是以分析的結果往往有所缺漏。近年來已有不少團隊針對時間序列設計分群方法,然而這些方法通常只適用在某些情況之下,比如說只適合於時間點多或者時間點少的資料。本研究提出一利用差距統計(Gap statistic)演算法來資料本身的資訊建立可能的時間走勢模型,再以這些模型進行分群的動作。並且使用二項式檢定法(binomial test)偵測分群結果中較為重要的群組。本研究以模擬的資料以及已經發表的生物晶片實驗的資料進行效能的測試,並且與一已經發表的時間序列分群演算法作比較。

並列摘要


Microarray is a high-throughput technology for investigating gene expression. There are two major kinds of experiment designs in Microarray, one is case control study and another is time series study. Clustering methods are developed in order to analyze microarray data. Clustering can help to discover similar samples or co-related genes according to expression profiles of samples or genes. Traditional clustering methods are not designed for analyzing time series therefore are easy to miss information or misclassify. Although there exist several clustering method for time series, these clustering methods is not suitable for all the condition. We create a new time series clustering Gap statistic and Template based clustering (GT-clustering) for analyzing time series microarray data in all condition (not matter long time series or short time series). GT-clustering designs templates for clustering by using Gap statistic. Besides, binomial test is applied to identify the significant clusters. In this study, the algorithm is tested in simulation data and published data and compared the result with a published algorithm.

參考文獻


1. Simon, R., Challenges of microarray data and the evaluation of gene expression profile signatures. Cancer Invest, 2008. 26(4): p. 327-32.
3. Katagiri, F. and J. Glazebrook, Pattern discovery in expression profiling data. Curr Protoc Mol Biol, 2005. Chapter 22: p. Unit 22 5.
4. Grant, G.R., E. Manduchi, and C.J. Stoeckert, Jr., Analysis and management of microarray gene expression data. Curr Protoc Mol Biol, 2007. Chapter 19: p. Unit 19 6.
5. Eisen, M.B., et al., Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A, 1998. 95(25): p. 14863-8.
6. Herwig, R., et al., Large-scale clustering of cDNA-fingerprinting data. Genome Res, 1999. 9(11): p. 1093-105.

延伸閱讀