透過您的圖書館登入
IP:18.116.21.109
  • 學位論文

從一級結構預測DNA結合蛋白之標的序列

Predicting target sequences of DNA-binding proteins based on primary structure

指導教授 : 歐陽彥正
共同指導教授 : 陳倩瑜
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


結合特定DNA序列的蛋白質在基因調控中扮演重要的角色,利用計算方法預測或設計生物實驗尋找這些DNA結合蛋白質的標的序列可以幫助我們了解基因調控如何進行,並解釋基因組中序列的變異如何擾亂正常的基因表現。位置頻率矩陣 (position frequency matrix) 是最常被拿來描述這些標的序列的模型,對大部分的物種而言,截至目前為止,只有一小部分的轉錄因子已經從相關生物實驗中取得這樣的模型。由於生物實驗往往需要高資金與人力成本,因此,如何利用計算方法準確預測位置頻率矩陣,加速這個研究領域的進展,一直以來是生物資訊學家非常關心的研究議題之一。這篇論文針對這個問題,提出一個利用蛋白質DNA複合物結構與紀錄不同胺基酸和核酸之間結合偏好的知識庫去預測DNA結合蛋白之標的序列的新方法。當我們拿到一條蛋白質序列,會先挑選一個適當的樣板複合物結構,接著利用該樣板與所得之知識庫進行位置頻率矩陣的預測。 這篇論文使用了兩組資料去評估新方法的表現,和其他利用三級結構的方法比較起來,這篇論文提出的新方法可以達到和它們一樣的預測效果;但若與另一個同樣以序列資訊為基礎且利用已知位置頻率矩陣訓練所得之預測模型相比,本論文所提之方法表現略差。由於現存這些以序列資訊為基礎的預測方法仍各有其侷限處,本論文所提之方法,仍可幫助一些相關的研究,針對其同源序列已有蛋白質DNA複合物結構之蛋白質序列預測其標的序列,所得之預測結果將有助於相關研究之進行。

關鍵字

DNA結合蛋白

並列摘要


Proteins that bind specific DNA sequences play important roles in regulating gene expression. Identifying target sequences of a DNA-binding protein helps to understand how genes are regulated in cells and explain how genetic variations cause disruption of normal gene expression. Position frequency matrices (PFMs) are one of the most widely used models to represent such target sequences. However, up to now, for most species, only a small fraction of the transcription factors (TFs) have experimentally determined PFMs. Since biological experiments usually require much time and cost, it is strongly desired to develop computational methods with satisfied accuracies to speedup the progress. Here, a new method based on existing protein-DNA complex structures and the knowledgebase containing the preference of contacts between amino acids and nucleotides is proposed to predict quantitative specificities of protein-DNA interactions. When given a query protein sequence, a protein-DNA complex structure of homologues proteins is selected and the PFM prediction is made based on the selected template incorporated with the built knowledgebase. The proposed method is evaluated by two datasets and compared with existing computational methods. It turns out that the proposed method can predict as well as the compared structure-based methods. On the other hand, when a sequence-based method that is trained by collected experimentally determined PFMs is compared, the proposed method performs slightly worse. Even though, the proposed method still has its value since different predictors usually have their own advantages and limitations. In summary, it is concluded that a DNA-binding protein’s binding preference can be predicted based on its primary structure using the complexes of its homologues. This facilitates related studies in the future because target sequences of proteins without a solved structure could be predicted now.

並列關鍵字

DNA-binding protein

參考文獻


1. Wrzodek, C., et al., ModuleMaster: A new tool to decipher transcriptional regulatory networks. Biosystems, 2010. 99(1): p. 79-81.
2. Rodionov, D.A., Comparative genomic reconstruction of transcriptional regulatory networks in bacteria. Chemical Reviews, 2007. 107(8): p. 3467-3497.
3. Bonneau, R., et al., The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biology, 2006. 7(5).
4. Alamanova, D., P. Stegmaier, and A. Kel, Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies. Bmc Bioinformatics, 2010. 11: p. -.
5. Morozov, A.V., et al., Protein-DNA binding specificity predictions with structural models. Nucleic acids research, 2005. 33(18): p. 5781-98.

延伸閱讀