透過您的圖書館登入
IP:3.12.36.30
  • 學位論文

Improved Affinity Propagation by Spline Interpolation on Time-Series Gene Expression Clustering

利用Spline內插技術改進時序基因資料的AP分群方法研究

指導教授 : 王家祥

摘要


近年來,由於DNA微陣列技術(microarray)的突破,生物學家得以同時觀察生物行為實驗上大量基因的表現。從時序資料中分析有意義的表現情形就變成了解生物系統的關鍵步驟。將已知功能與未知功能的基因透過分群技術自動分類,其結果或能推測未知基因的功能。但由於生物實驗中有許多不確定性及雜訊,時序性基因表現資料的分析並不容易。早期的分群演算法像是k-means、self-organizing maps和hierarchical clustering忽略了時間序列中,在連續時間點內的高度相關性;相較之下,根據機率模型的演算法如dynamic Bayesian networks (DBN)和hidden Markov models (HMM)則更適合分析時間序列,然而此類演算法卻缺乏效率。另外生物實驗採樣的時間間隔長,也可能有取樣不足的問題。在這篇論文中,我們提出了一個結合了Spline內插技術和Affinity Propagation(AP)的非監督式分群演算法。我們提出的方法檢查每個時間區段基因之間的表現關係,並減輕了雜訊和分離物的影響。透過實際分析酵母菌時序基因表現資料庫,我們的方法有顯著的分群準確率,而且不需要預先得知分群數量與分群中心點的資訊。提供關於基因表現時序資料分群的一個未來發展方向。

並列摘要


DNA microarray technology has been widely used in life science research for many years. The technology allows scientists monitoring genes' expression level during biological processes simultaneously. Analyzing massive time-series data is important to explore the complex dynamics of biological systems. However, the analysis task of time-series gene expression data is difficult since noise levels and measurement uncertainties are high. The early clustering methods such as k-means, self-organizing maps and hierarchical clustering disregarded the temporal dependency between successive time points. As for probabilistic model-based methods, dynamic Bayesian networks (DBN) and hidden Markov models (HMM), are more suitable for time-series but fail in computational inefficiency. In addition, real gene datasets has undersampling problem for long intervals between time points of harvesting expression data. In this thesis, an unsupervised clustering algorithm which combines Spline interpolation and Affinity Propagation is proposed. The proposed method investigates the relationship between genes across distinct time points through the interval selection after using interpolation to eliminate the influence of undersampling. We demonstrate our method result in significant accuracy on real gene expression time-series datasets without extit{priori} knowledge such as the number of clusters and exemplars. Our study provides a way of clustering gene expression time-series data for future biological investigations.

參考文獻


of Biomedical Engineering, 9:205-228.
Bandyopadhyay, S., Mukhopadhyay, A., and Maulik, U. -2007-. An improved algorithm for clustering gene expression data. Bioinformatics, 23(21):2859-02865.
Bar-Joseph, Z. (2004). Analyzing time series gene expression data. Bioinformatics, 20(16):2493-2503.
Chiu, T.-Y., Hsu, T.-C., and Wang, J.-S. (2010). Ap-based consensus clustering for gene expression time series. IAPR International Conference on Pattern Recognition.
Davis, R. W. (1998). A genome-wide transcriptional analysis of the mitotic cell

被引用紀錄


蔡承育(2015)。歐陸法下第三人利益契約制度之發展〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2015.00660

延伸閱讀