使用遺傳演算法在基因表現資料上辨識有意義的轉錄模組

摘　要所謂的轉錄模組(transcriptional module)是指當一群基因(gene)在某一些特定的實驗條件下，有著共同的調控(co-regulated)關係時所組合而成的集合。在同一轉錄模組中的基因，往往可以被假設成具有同樣的弁遄A並且會被相同的轉錄因子(transcriptional factor)黏附在其序列前端之某一個促進子序列(promoter sequence)上，然後受其調控。於是，辨識有顯著意義的轉錄模組或野i以幫助生物學家進行釵h物種的整體基因網絡(genetic network)的重新建構，並且得以詳細地了解生物體內複雜的運作機制。目前來說，利用一種高生產量(high-throughput)的微陣列(microarray)技術來進行基因表現(gene expression)的量測，為達成建構基因網絡目標之最有效且直覺的方法。然而，藉由分析由微陣列所產生出來的基因表現資料(gene expression data)以探測轉錄模組時，雖然已有了(雙向)叢集(clustering)技術的提出，可是到目前為止，卻仍然是一項極複雜的議題。在本論文中，首先將有意義的轉錄模組公式化，好符合一個全新提出的模型，並且再對什麼是好的轉錄模組來進行重新定義。然後，再藉由提出一個新的雙向叢集(biclustering)方法來將辨識有意義的轉錄模組此問題當作成一種最佳化 (optimization)的問題，並且應用遺傳演算法(genetic algorithms)來解決，以避免遇到與一般的啟發式(heuristic)方法一樣，陷入所謂的局部最佳化值(local optimal)。之後，模型中的特殊例子(special case)首先被拿來進行評斷，以決定其適合度函式(fitness function)是否有效且正確。最後，兩筆由人類(Homo sapiens)及酵母菌(Saccharomyces cerevisiae)的基因表現資料中所產生出來之有意義的轉錄模組，將再一次以資訊(in silicon)的方式進行驗證。最後，由實驗結果顯示，本論文中所提出的方法在辨識有意義的轉錄模組時的確是非常地傑出的，而且也比一般的啟發式方法更能夠找到具有相同弁鉞鸗

關鍵字

叢集；微陣列；遺傳演算法；轉錄模組；基因表現資料

並列摘要

Abstract Transcriptional module is a set of genes that are co-regulated under particular experimental conditions. Genes in the same transcriptional modules are supposed to have the same function and regulated by the common transcriptional factors that bind to some promoter sequence in the upstream region. Identifying significant transcrip-tional modules may help biologists to reconstruct the whole genetic network between lots of organisms, and understanding the complex biological mechanisms in detailed. At present, utilizing microarray techniques, a high-throughput method for measuring gene expression, is an effective and intuitive way to achieve this goal. However, in the analysis of large-scale gene expression data from microarray, it's still a complicated topic until now for detecting transcriptional modules, even if several (bi-)clustering approaches are proposed. In this thesis, the significant transcriptional modules are formulated to suit a novel model at first, and the goodness of a transcriptional module is clarified by the new definitions. Afterwards, a new biclustering approach is devised to treat identify-ing significant transcriptional modules among gene expression data as an optimization problem, and applying genetic algorithms to solve it for avoiding trapping into local optimal like other heuristic approaches. The special case of the proposed model is evaluating first for proving the effectiveness and correctness of the fitness function. At last, two large-scale gene expression data from Homo sapiens and Saccharomyces cerevisiae are both tested, and the derived significant transcriptional modules are evaluated again in silicon. These experimental results show that the proposed ap-proach is excellent in identifying significant transcriptional modules, and is also supe-rior to heuristics for detecting gene groups with more similar functional annotations. Owing to the outstanding results, it's believed that the proposed approach is worthy of putting into advanced biological problems, such as phenotype classification in cancer research, functional predictions, and genetic network reconstruction et al.

並列關鍵字

transcriptional module ； Genetic Algorithms ； microarray ； clustering ； gene expression data

參考文獻

[1] D'haeseleer, P., Liang, S., Somogyi, R., “Genetic network inference: from co-expression clustering to reverse engineering”, Bioinformatics, Oxford University Press, Vol. 16(8): pp.707-726, 2000.

[2] Ihmels, J., Bergmann, S., and Barkai, N., “Defining transcription modules using large-scale gene expression data”, Bioinformatics, Oxford University Press, 2004.

[3] Cheng, Y., and Church, G. M., “Biclustering of expression data”, ISMB 2000 pro-ceedings, pp.93-103, 2000.

[4] Peeters, R., “The maximum edge biclique problem in np-complete”, Discrete Ap-plied Mathematics, 131(3): pp.651-654, 2003.

[5] Goldberg, D. E., Genetic Algorithms in Search, Optimization and Machine Learning, Addison Wesley Publishing Company, 1989.

國際替代計量

使用遺傳演算法在基因表現資料上辨識有意義的轉錄模組

全文下載

主題瀏覽