透過您的圖書館登入
IP:3.131.13.132
  • 學位論文

以線性轉換與正規化方法為基礎的蛋白質二級結構預測

Protein Secondary Structure Prediction Based on Linear Transformation and Normalization Methods

指導教授 : 唐傳義

摘要


蛋白質二級結構預測已被廣泛研究將近五十年,各種不同的方法陸續被提出,機器學習亦是其中一種可行的解決辦法,其正確率可達百分之七十。建立在機器學習的架構之上的工具中,尤以PSIPRED、PHD以及PROF最為有名,他們將二級結構分成三種狀態:helix、strand與coil。   其後,各種機器學習的方法陸續提出,但他們的正確率大多接近或低於PSIPRED,況且設計一個新的機器學習方法需要投注大量心力。在這樣的考量之下,利用現有的方法做改良或是混合不同方法的結果都是一項不錯的選擇。   RAP是一種後處理的方法,利用線性轉換與正規化方法來改良目前的結果,因此,RAP可被應用到各種蛋白質二級結構預測的工具上。我們從CASP選出181個蛋白質序列,以及從PDB中選出31402個蛋白質序列,並從中分出69534個蛋白質串鏈,我們利用這兩種資料來測量RAP的效能。我們的實驗使用PHD、PROF與PSIPRED這三種工具提供helix、strand以及coil的分數,RAP利用這些分數,個別對這三種方法做改良。與這三種方法的結果相比,RAP可偵測出較多的二級結構片段。此外,若將這三種方法的結果與RAP的結果相結合,可得到較高的正確率;反之,若與其他方法結合,正確率反而下降。   目前,RAP可在以下網站操作:http://ensembl.cs.nthu.edu.tw/RAP/。

並列摘要


Protein secondary structure prediction has been extensively discussed for almost 50 years and the machine learning is one of feasible methods for it with more than 70% accuracy. PSIPRED, PHD and PROF are well-known machine learning approaches and based on the three-state prediction, helix, strand, and coil. Various prediction tools based on the machine learning have been proposed. However, these tools may make a lot of effort to develop and their accuracy was close to or even lower than PSIPRED. Under the concern, making use of or combining outputs from existing methods is an alternative to make improvements. RAP is a post-processing method using linear transformation and normalization to refine scores of three-state prediction. Hence, RAP can be easily applied to any protein secondary structure prediction tool if it uses three-state prediction. RAP was tested on the CASP data set with 181 targets and a large-scale data set with 69534 chains separated from 31402 proteins in PDB. In the experiment, PHD, PROF and PSIPRED were used to give scores of three-state prediction for each target protein; then, RAP predicted secondary structures by refining the scores from them. More secondary structural segments were detected by RAP than by PHD, PROF and PSIPRED. Moreover, prediction results of combining methods with RAP can achieve higher accuracy than without RAP. RAP is freely available via http://ensembl.cs.nthu.edu.tw/RAP/.

參考文獻


2. Bairoch, A. and Apweiler, R. (1996) The SWISS-PROT protein sequence data bank and its new supplement TREMBL. Nucleic Acids Res., 24, 21–25.
3. Chou, K.C. and Zhang, C.T. (1995) Prediction of protein structural classes, Crit. Rev. in Biochem. Mol. Biol., 30, 275-349.
4. McGuffin,L.J. and Jones,D.T. (2003) Benchmarking secondary structure prediction for fold recognition. Proteins, 52, 166-175.
5. Bonneau,R., Tsia,J., Ruczinski,I., Chivian,D., Rohl, Strauss,C.E.M. and Baker, D. (2001) Rosetta in CASP4: Progress in ab initio protein structure prediction. Proteins, 45, 119-126.
6. Phillips, A., Janies, D. and Wheeler, W. (2000) Multiple Sequence Alignment in Phylogenetic Analysis. Molecular Phylogenetics and Evolution, 16, 317-330.

延伸閱讀