透過您的圖書館登入
IP:18.118.137.243
  • 學位論文

以序列為基礎建構小型DNA結合區域之蛋白質與DNA交互作用模型

Modeling protein-DNA interactions from sequences for small DNA-binding domains

指導教授 : 陳倩瑜

摘要


蛋白質與DNA交互作用發生於許多基本生化作用中,例如,基因表現之調控與DNA修復。我們可以藉由蛋白質與DNA之共同結晶結構,亦即複合體結構,理解它們如何相互作用,但是蛋白質結晶需要透過昂貴且費時的實驗才能得到,以致於這些知識非常有限。另一方面,由於基因及蛋白質定序技術的精進,大量的一級結構資訊被解出,在已知會與DNA鍵結的蛋白質當中,序列資訊為複合體結構資訊的數十倍之多。因此,本研究旨在藉由序列及結構分析工具建構蛋白質與DNA交互作用模型,也就是說,藉由蛋白質與DNA的序列資訊模擬其互動模式。我們將這個問題切割為兩個小主題:一為有系統地利用蛋白質序列預測其三級結構,二為藉由預測之蛋白質結構模型,建構蛋白質與DNA之互動模式。實驗結果顯示從頭開始結構預測(de novo structure prediction)軟體,Rosetta,可以準確地預測出蛋白質三級結構;也就是說,在Rosetta產生大量的結構模型之後,我們可根據序列為基礎的預測 RSA (relative solvent accessibility) 和結構模型的RSA的相關係數,搭配以統計為基礎的能量計算公式,挑選出貼近原始的結構。除此之外,當現有的結構中僅有序列相似度較低的結構可作為模板時,以模板為基礎的結構預測法(template-based modeling)可能無法進行預測;然而,從頭開始結構預測法,對於所有的蛋白質序列都能產生預測結果,且其準確度並不遜於以模板為基礎的結構預測法;因此,在模板與欲預測蛋白質序列相似度較低的情況下,從頭開始結構預測法會是更好的選擇。最後,當利用預測的蛋白質結構透過結構比對,或嵌合演算法模擬其與DNA之互動模式時,若蛋白質模型和真實結構的相似度越高,則其模擬的結果將更準確;而結構比對所建立的互動模式,準確度優於嵌合演算法。總而言之,在缺乏複合體結構資訊的情形下,本研究提出之流程可依據序列資訊,建構出蛋白質與DNA交互作用之模型,對於預測蛋白質-DNA之結合將有莫大幫助。

並列摘要


Protein-DNA interaction plays an important role in many fundamental bio-chemical activities, for example, gene regulation and DNA repair. Researchers can understand how protein and DNA interact by examining available co-crystallized structures. However, such knowledge is very scarce because experimentally determining atom-level structure models of protein-DNA complexes requires expensive and time-consuming processes. On the contrary, due to recent advances in whole-genome sequencing technology, the sequence information of known DNA-binding proteins is much more than the number of protein-DNA tertiary complexes. Therefore, this study aims at constructing protein-DNA interaction models by integrating a number of in silico analyses based on sequences and predicted structures, i.e., creating the interaction models from sequences of proteins and DNA. This problem can be segmented into two sub-topics, both concerning tertiary structures: to predict protein tertiary structure in a systematic way and to construct predicted protein-DNA complexes. We use Rosetta to generate ten thousand decoys and select close-to-native protein structures from them. In addition, the protein-DNA complexes are predicted by the docking method, HADDOCK, or the template-based method, DBD2BS. Our results demonstrate that the protein structure can be predicted by de novo structure prediction for DNA-binding domains of small sizes. To be specific, after creating plenty of decoys by Rosetta, close-to-native structures can be selected by combination of correlation coefficient of sequence-based predicted RSA, decoy’s RSA, and one of knowledge-based energy scores. In addition, the performance of structural models created by de novo structure prediction is better than template-based modeling when only distant templates are available. All of the query proteins have prediction results of de novo structure prediction, while only proteins with templates which are similar to the query can be predicted by template-based methods. When both approaches deliver predictions, the qualities of modeling are similar. Furthermore, the accuracy of protein-DNA interacting models constructed by structure alignment is better than those predicted by docking tools when close-to-native protein structures are available. In summary, this study concludes that it is possible to construct the interaction model of protein and DNA even in the absence of co-crystallized structure.

參考文獻


Ahmad, S., Gromiha, M.M., Sarai, A., 2004. Analysis and prediction of DNA-binding proteins and their binding residues based on composition, sequence and structural information. Bioinformatics 20:477-486.
Ahmad, S., Sarai, A., 2005. PSSM-based prediction of DNA binding sites in proteins. Bmc Bioinformatics 6:-.
Baldwin, R.L., 2007. Energetics of protein folding. J Mol Biol 371:283-301.
Blancafort, P., Segal, D.J., Barbas, C.F., 3rd, 2004. Designing transcription factor architectures for drug discovery. Mol Pharmacol 66:1361-1371.
Bradley, P., Misura, K.M., Baker, D., 2005a. Toward high-resolution de novo structure prediction for small proteins. Science 309:1868-1871.

被引用紀錄


Chao, C. H. (2011). 建立以機器學習演算法為基礎之評分函數預測蛋白質與DNA結合之親和力 [master's thesis, National Taiwan University]. Airiti Library. https://doi.org/10.6342/NTU.2011.02382

延伸閱讀