  • 學位論文


Prediction of DNA Binding Transcription Factor segments under Specified structure

指導教授 : 歐陽彥正


本篇論文主要探討如何準確預測蛋白質上的二級結構片段是否會與DNA進行結合,二級結構片段的特徵建構則利用目前許多已知在蛋白質與DNA結合上有意義的特徵進行建構,本論文會展示如何將不同的長度的二級結構片段的位置加權矩陣轉換成與長度無關的特徵表。 本論文收集了兩組資料集,各有其不同的生物意義,我們將會討論是那些因素造成兩組資料集之間效能的落差,同時提出一個兩階段方法用於表現較不好的那組資料集上,以期能將兩組資料集的效能落差近可能的縮小,同時也會呈現何種二級結構是最容易被預測是否會與DNA結合並討論其原因。 在兩組資料集下,我們的方法對helix型態的二級結構分別可以達到75%的涵蓋度、80%的精確度、92%的專一度以及65%的涵蓋度、85%的精確度、98%的專一度。


This thesis discusses the design of a predictor aimed at identifying the secondary structures in a transcription factor that are involved in interaction with the DNA. In particular, the design of the predictor has been optimized for identifying the alpha-helix structures involved in interaction with the DNA due to their prevalence. In the design of the predictor, the support vector machine (SVM) was employed and the study reported in this thesis focused on the features exploited for making prediction. In the experiments conducted in this study, two datasets have been used. The first dataset was derived from the TF-DNA complexes deposited in the Protein Data Bank (PDB) and the second dataset was derived from the TF sequences deposited in SWISS-PROT. With respect to identifying the alpha-helix structures involved in interaction with the DNA, the predictor proposed in this thesis delivered sensitivity of 75%, precision of 80%, and specificity of 92% with the first dataset and sensitivity 65%, precision 85%, and specificity 98% with the second dataset.


SVM secondary structure DNA prediction binding segments


1. Yan, C., et al., Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinformatics, 2006. 7: p. 262.
2. Ferrer-Costa, C., et al., HTHquery: a method for detecting DNA-binding proteins with a helix-turn-helix structural motif. Bioinformatics, 2005. 21(18): p. 3679-80
3. Ofran, Y., V. Mysore, and B. Rost, Prediction of DNA-binding residues from sequence. Bioinformatics, 2007. 23(13): p. i347-53.
4. Ahmad, S. and A. Sarai, PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics, 2005. 6: p. 33.
5. Jones, S., et al., Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucleic Acids Res, 2003. 31(24): p. 7189-98.
