透過您的圖書館登入
IP:18.118.32.213
  • 學位論文

A Systematical Approach for Discovering Gene Regulatory Binding Motifs in Silico

以系統化的方法預測基因調控序列

指導教授 : 唐傳義

摘要


The identification of regulatory elements recognized by transcription factors and chromatin remodeling factors is essential to studying the regulation of gene expression. When no auxiliary data, such as orthologous sequences or expression profiles, are used, the accuracy of most tools for motif discovery is strongly influenced by the motif degeneracy and the lengths of sequence. Since suitable auxiliary data may not always be available, more work must be conducted to enhance tool performance to identify transcription elements in the metazoan. A non-alignment-based algorithm, MotifSeeker, is proposed to enhance the accuracy of discovering degenerate motifs. MotifSeeker utilizes the property that variable sites of transcription elements are usually position-specific to reduce exposure to noise. Consequently, the efficiency and accuracy of motif identification are improved. Using data fusion, the ranking process integrates two measures of motif significance, resulting in a more robust significance measure. Testing results for the synthetic data reveal that the accuracy of MotifSeeker is less sensitive to the motif degeneracy and the length of input sequences. Furthermore, MotifSeeker has been tested on a well-known benchmark, yielding a correlation coefficient of 0.262, which compares favorably with those of other tools. The high applicability of MotifSeeker to biological data is further demonstrated experimentally on regulons of S. cerevisiae and liver-specific genes with experimentally verified regulatory elements. In order to investigate the transcriptional reprogramming between backup paralogs, we use a systematic approach to find clusters of co-regulated genes. Moreover, we also apply high throughput genome-wide ChIP-chip data and MotifSeeker to identify shared transcription regulators between both backup gene members. The results shows that transcriptional reprogramming is one of the duplicate-associated genetic buffering mechanisms, but other mechanisms beyond transcriptional level appear to exist.

並列摘要


尋找、辨識基因調控序列,在基因表現的探討研究上仍是十分重要的議題。目前有許多調控序列,被發現在基因表現上扮演著重要的角色,例如,轉錄因子結合位點和甲機化位點等。目前眾多的方法,其辨識率大多深受輸入序列總長以及調控序列多型性程度的影響。本篇研究利用調控序列特定位點必須保留的特性 (positional specificity),設計了一個有效的演算法,配合異種同源共同調控(多物種多基因)和組織特異的基因等後處理。並利用資料融合(data fuion)的原理,發展一套融合兩種基因調控序列特性的排序方法,以從眾多可能的調控序列中找出最具統計顯著性的代表。在啟動子模擬資料的測試下,此方法的準確性較不容易受到啟動子序列長度以及調控序列多型性程度的影響。在酵母菌共同調控基因群的測試中,亦有優異的結果。配合輔助資料的後處理,也能在人類肝臟特異細胞中,找到數個實驗上已證實的基因調控序列。 本研究亦以系統生物學的觀點,結合蛋白質交互作用與基因表現的資料,提出找尋共同調控基因群的方法。並將調控序列辨識的方法應用在互補基因間轉錄調控的探索上。結果顯示,轉錄調控的改編,的確是互補基因可能的機制之一,但其他層次的調控仍極有可能存在著。

參考文獻


1. Keich, W. N. et al (2004) A mutation in a functional Sp1 binding site of the telomerase RNA gene (hTERC) promoter in a patient with Paroxysmal Nocturnal Haemoglobinuria. BMC Blood Disorders, 4(1): 3.
3. Berezikov E., Guryev V., and Cuppen E. (2005) CONREAL web server: identification and visualization of conserved transcription factor binding sites, Nucleic Acids Res., Jul 2005; 33: W447 - W450.
4. Prakash A, and Tompa M. (2005) Discovery of regulatory elements in vertebrates through comparative genomics. Nat Biotechnol. Oct; 23(10):1249-56.
5. Ho Sui SJ, Mortimer JR, Arenillas DJ, Brumm J, Walsh CJ, Kennedy BP, and Wasserman WW. (2005) oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes, Nucleic Acids Res., Jun 2005; 33: 3154 - 3164.
6. Wang T, and Stormo, G.D. (2005) Identifying the conserved network of cis-regulatory sites of a eukaryotic genome. Proc. Natl. Acad. Sci. U S A. Nov 29;102(48):17400-5.

延伸閱讀