利用蛋白質支鏈與DNA鹼基之相對空間幾何特性預測蛋白質與DNA之結合軌跡

許多蛋白質利用辨識特定或非特定DNA序列片段實現它的功能，基於此重要性，我們經常使用計算方法來尋找可能與DNA結合的胺基酸。現今，我們可以從蛋白質結構資料庫Protein Data Bank (PDB)取得許多蛋白質與DNA共結晶之複合物，來幫助我們進一步了解這些蛋白質如何辨識特定核苷酸序列。這些資訊對於生物學家預測某些特定蛋白質於DNA序列中的結合位置將有莫大的幫助，例如：尋找轉錄因子(transcription factors)之結合位置(transcription factor binding sites)。在未來，這些資料將能幫助我們更正確地了解基因調控與基因調控網路。雖然目前結構資料庫中有許多蛋白質與DNA複合物可以讓我們清楚了解蛋白質與DNA結合時之相互幾何關係，但如果在沒有蛋白質和DNA複合物的資料下，想要直接預測蛋白質和DNA的結合機制是一項非常困難的工作。本篇論文期望能提出一個給定已知的蛋白質結構就可以預測DNA結合位置與方位之演算法；首先，我們利用序列特徵探勘工具(MAGIIC-PRO)從給定蛋白質序列之相關序列中找出保留性區域，藉此探索此蛋白質的功能性區域。在功能性區域被找到之後，我們進一步篩選出表面胺基酸，再從此子集合中利用分群演算法篩出最有可能與DNA結合之胺基酸群，進而應用主成分分析(Principal Component Analysis， PCA)於這些原子的座標用來預測DNA分子的凹槽方向。實驗結果顯示，此論文所提之方法可以成功的預測出所選定的功能性區域附近的DNA凹槽方向；而且，我們經由一個以徑向基底函數為核心的評分函數可成功預測出空間中最容易出現鹼基的位置。相信本論文所提出的方法之輸出資訊將有效幫助更進一步的蛋白質-DNA互動分析研究，像是蛋白質-DNA嵌合模擬與預測轉錄因子結合位置。

關鍵字

DNA鍵結位置；鍵結走向；以結構為基礎之預測；蛋白質與DNA之交互作用

並列摘要

DNA-binding proteins reveal their functions through specific or non-specific protein-DNA recognition. Identifying DNA-binding residues with computational tools facilitates predicting or validating protein functions at a high-throughput rate. The protein-DNA complexes available in Protein Data Bank (PDB) further unveils how a DNA-binding protein recognizes its partners. Such information greatly helps biologists to determine or predict the binding elements in DNA sequences such as transcription factor binding sites (TFBSs). In this way, accurate regulatory networks in whole-genome scale can be constructed more efficiently in the near future. While it remains a challenging task to understand the mechanism of protein-DNA interactions without crystal complex structures, this thesis proposes an algorithm to predict the binding position and direction of DNA when given a known protein structure. First, potential DNA-binding regions of a query protein is predicted by a sequential pattern mining software, MAGIIC-PRO, which identifies functional regions of a protein by discovering concurrent conserved regions among its related protein sequences. After functional regions are predicted, we extract the residues in the protein surface and use hierarchical clustering algorithm to derive potential DNA-binding units, compact conserved regions with high DNA-binding propensity. Afterward, principal component analysis (PCA) is applied on the collected atoms to predict the orientation of DNA grooves. In order to derive the positions where the DNA bases like to be present, we propose a knowledge-based learning procedure to construct a predicting model that considers geometric propensity between protein side chains and DNA bases. The experiments conducted in the thesis reveal that we can predict the orientation of the DNA grooves around the selected conserved regions with satisfied errors. Furthermore, with a well-designed scoring function that incorporates radius basis function (RBF) as the kernel, we build spatial distributions of the positions where DNA bases likes to be present. The computational outputs are expected to provide useful information for many of the next-step analyses such as protein-DNA docking and TFBS predictions.

並列關鍵字

DNA-binding sites ； binding orientation ； structure-based prediction ； protein-DNA interactions

參考文獻

Bruschweiler, R., 2003. Efficient RMSD measures for the comparison of two molecular ensembles, PROTEINS-NEW YORK-. 50(1): p. 26-34.

Diekmann, S., 1989. Definitions and nomenclature of nucleic acid structure parameters, EMBO journal(Print). 8(1): p. 1-4.

Gao, M. and J. Skolnick, 2008. DBD-Hunter: a knowledge-based method for the prediction of DNA-protein interactions, Nucleic Acids Research. 36(12): p. 3978.

Gorin, A., V. Zhurkin, and K. Wilma, 1995. B-DNA Twisting Correlates with Base-pair Morphology, Journal of Molecular Biology. 247(1): p. 34-48.

Hotelling, H., 1933. Analysis of a complex of statistical variables into principal components, Journal of Educational Psychology. 24(6): p. 417-441.

國際替代計量

利用蛋白質支鏈與DNA鹼基之相對空間幾何特性預測蛋白質與DNA之結合軌跡

全文下載

主題瀏覽