Title

預測轉錄因子上與去氧核醣酸結之區段

Translated Titles

Prediction of DNA Binding Transcription Factor segments under Specified structure

DOI

10.6342/NTU.2008.02436

Authors

徐振傑

Key Words

支援向量機 ; 二級結構 ; 去氧核醣酸 ; 預測 ; 結合片段 ; SVM ; secondary structure ; DNA ; prediction ; binding segments

PublicationName

臺灣大學資訊工程學研究所學位論文

Volume or Term/Year and Month of Publication

2008年

Academic Degree Category

碩士

Advisor

歐陽彥正

Content Language

繁體中文

Chinese Abstract

本篇論文主要探討如何準確預測蛋白質上的二級結構片段是否會與DNA進行結合,二級結構片段的特徵建構則利用目前許多已知在蛋白質與DNA結合上有意義的特徵進行建構,本論文會展示如何將不同的長度的二級結構片段的位置加權矩陣轉換成與長度無關的特徵表。 本論文收集了兩組資料集,各有其不同的生物意義,我們將會討論是那些因素造成兩組資料集之間效能的落差,同時提出一個兩階段方法用於表現較不好的那組資料集上,以期能將兩組資料集的效能落差近可能的縮小,同時也會呈現何種二級結構是最容易被預測是否會與DNA結合並討論其原因。 在兩組資料集下,我們的方法對helix型態的二級結構分別可以達到75%的涵蓋度、80%的精確度、92%的專一度以及65%的涵蓋度、85%的精確度、98%的專一度。

English Abstract

This thesis discusses the design of a predictor aimed at identifying the secondary structures in a transcription factor that are involved in interaction with the DNA. In particular, the design of the predictor has been optimized for identifying the alpha-helix structures involved in interaction with the DNA due to their prevalence. In the design of the predictor, the support vector machine (SVM) was employed and the study reported in this thesis focused on the features exploited for making prediction. In the experiments conducted in this study, two datasets have been used. The first dataset was derived from the TF-DNA complexes deposited in the Protein Data Bank (PDB) and the second dataset was derived from the TF sequences deposited in SWISS-PROT. With respect to identifying the alpha-helix structures involved in interaction with the DNA, the predictor proposed in this thesis delivered sensitivity of 75%, precision of 80%, and specificity of 92% with the first dataset and sensitivity 65%, precision 85%, and specificity 98% with the second dataset.

Topic Category 基礎與應用科學 > 資訊科學
電機資訊學院 > 資訊工程學研究所
Reference
  1. 1. Yan, C., et al., Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinformatics, 2006. 7: p. 262.
    連結:
  2. 2. Ferrer-Costa, C., et al., HTHquery: a method for detecting DNA-binding proteins with a helix-turn-helix structural motif. Bioinformatics, 2005. 21(18): p. 3679-80
    連結:
  3. 3. Ofran, Y., V. Mysore, and B. Rost, Prediction of DNA-binding residues from sequence. Bioinformatics, 2007. 23(13): p. i347-53.
    連結:
  4. 4. Ahmad, S. and A. Sarai, PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics, 2005. 6: p. 33.
    連結:
  5. 5. Jones, S., et al., Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. Nucleic Acids Res, 2003. 31(24): p. 7189-98.
    連結:
  6. 6. Finn, R.D., et al., Pfam: clans, web tools and services. Nucleic Acids Res, 2006. 34(Database issue): p. D247-51.
    連結:
  7. 7. Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 1997. 25(17): p. 3389-402.
    連結:
  8. 8. D. T. Jones. Protein secondary structure prediction based on position-specific scoring machines. J. Molecular Biology, vol. 292, no.2, 1999
    連結:
  9. 9. Chang, C. and C. Lin, {LIBSVM}: a library for support vector machines. 2001
    連結:
  10. 10. Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22,4673–4680.
    連結:
  11. 11. Berman, H.M., et al., The Protein Data Bank. Nucleic Acids Res, 2000. 28(1): p. 235-42.
    連結:
  12. 12. Wu, C.H. et al. (January 2006) "The Universal Protein Resource (UniProt): an expanding universe of protein information.". Nucleic Acids Research, 1;34 (Database issue): D187–91.
    連結:
  13. 14. Liu, J., et al., Intrinsic disorder in transcription factors. Biochemistry, 2006. 45(22): p. 6873-88.
    連結:
  14. 15. Bairoch Amos (2000). "Serendipity in bioinformatics, the tribulations of a Swiss bioinformatician through exciting times!". Bioinformatics 16: 48-64.
    連結:
  15. 13. Luscombe, N.M., et al., An overview of the structures of protein-DNA complexes.Genome Biol, 2000. 1(1): p. REVIEWS001.