透過您的圖書館登入
IP:18.224.39.74
  • 學位論文

使用統計模型從基因序列中提取出轉錄子結合點位置

Extracting Transcription Factor Binding Sites from Unaligned Gene Sequences with Statistical Models

指導教授 : 呂忠津

摘要


蛋白質的合成是生物的生理反應中最為重要的步驟,經研究的結果我們知道,脫氧核糖核酸序列經過轉錄、轉譯的作用之後,會合成出生理反應所需要的蛋白質產物。而轉錄、轉譯的作用會被特定的轉錄因子連結點所調控,這些連結點會影響到脫氧核糖核酸序列是否會合成出相對的蛋白質產物,所以轉錄因子的聯接點在調控生物的生理反應中有著非常關鍵的位置。 設法找出不同物種的各種轉錄因子連結點是目前生物資訊領域中一個很重要的研究方向。近來,由於技術的進步,已經可以利用脫氧核糖核酸微陣列互交在相對基因組位置分析的方法來找出有被轉錄因子調控的基因序列,但遺憾的是這種實驗方法只能找出一段大略的轉錄因子調控區間,但卻無法準確的找出真正的轉錄因子連結點位置。因此,我們希望能利用統計的方式來準確的找出真正的連結點位置。 在這篇畢業論文中,我們撰寫了一個可以找出在基因組位置分析的基因序列中特定的轉錄因子聯接點的程式,所使用的方法是以二項式分布機率模型的統計性找出最為顯著的基因樣式來建立一開始的搜尋位置,並結合關聯圖及其展開的貝氏網路和吉氏取樣方法反覆地的搜尋出最為可能的轉錄因子聯接點。接者,我們先收集已知的轉錄因子連結點資料,再將我們的結果和其他方法的結果做比較。在各種方法中,我們的程式在和其他方法比較下有著較佳的表現。

並列摘要


Transcription factor binding sites (motifs) are crucial in the regulation of the gene transcription. Recently, the chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP array) have been used to identify potential regulatory sequences, but the procedure can only map the probable protein-DNA interaction loci within 1-2 kilobases resolution. To find out the exact binding motifs, it is necessary to build a computational method to examine the ChIP-array binding sequences and search for possible motifs representing the transcription factor binding sites. In this thesis, we design a program to find out accurate motif sites in the yeast genome with dependency graphs and their expanded Bayesian networks. The program incorporates with the binomial probability model to build significant initial motif sets. Finally, we compare our results with those obtained from famous programs and show that our program outperforms these program in the consistence with known specificities.

參考文獻


Wooten, Detecting subtle sequence signals: A gibbs sampling strategy for multiple alignment." Science, vol. 262, pp. 208-214, 1993.
[6] M. Zhang and T. Marr, A weight array method for splicing signal analysis." Comput.Appl.Biosci., vol. 9, pp. 499-509, 1993.
[8] W. Thompson, E. C. Rouchka, and C. E. Lawrence, Gibbs recursive sampler: finding transcription factor binding sites." Nucleic Acids Res., vol. 20, pp. 3580-3585, 2003.
[9] D. B. Gordon, L. Nekludova, S. McCallum, and E. Fraenkel, Tamo: a °exible, object-oriented framework for analyzing transcriptional regulation using dna-sequence motifs."Bioinformatics, vol. 21, pp. 3164-3165, 2005.
[10] T. Bailey and C. Elkan, Unsupervised learning of multiple motif in biopolmers using

延伸閱讀