透過您的圖書館登入
IP:18.191.175.60
  • 學位論文

利用序列特徵探勘預測酵素催化部位

Prediction of enzyme catalytic sites by sequential pattern mining

指導教授 : 歐陽彥正

摘要


大規模地以非人工的方式註解蛋白質的功能或序列特徵(signature),在後基因時代仍然是一項大挑戰,在此論文中,我們利用蛋白質的序列特徵設計一個預測方法,預測酵素序列的催化部位(catalytic sites)。我們的方法利用模體(motif)探勘的方式產生蛋白質序列特徵,每個序列特徵包含了幾個重要的殘基區塊,這些區塊也稱為保留性區塊(conserved segments),這些保留性區塊在同源序列上常常一起出現,它們在演化過程中被小心地保留下來,表示這些區塊有一定的重要性。依照生物實驗結果,酵素的催化殘基通常分散在蛋白質序列的不同區域,因此若要完整的預測催化殘基部位,產生的序列特徵也必須分散在蛋白質序列的不同區域。在本論文中,我們蒐集Catalytic Site Atlas (CSA)資料庫中的催化殘基資訊來評估我們所提出的預測方法之效能。測試結果顯示,我們的方法比PROSITE資料庫中的模板更能夠辨識催化部位和催化殘基。本論文將此研究方法實作成E1DS網站(http://e1ds.csbb.ntu.edu.tw/),E1DS目前有5421個序列特徵,這些序列特徵總共涵蓋932個4碼EC編號 ( numbers)。平均而言,在預測催化位置上,E1DS的正確率(correct)達到35.5%;成功猜測率(success rate)達到49.6%,而PROSITE的正確率及成功猜測率分別為18.9%及33.7%,在預測催化位置這部分,E1DS的正確率和成功猜測率均表現的比PROSITE理想。在預測催化殘基部分,E1DS的靈敏度(sensitivity)為30.0%,比PROSITE (16.2%)來得要好,但就明確度(specificity)而言,E1DS (96.7%)表現的比PROSITE (98.6%)來得差。

並列摘要


Large-scale automatic annotation for protein sequences remains challenging in post-genomics era. This thesis aims at predicting catalytic sites of enzyme sequences based on a repository of protein signatures. The employed sequence signatures are derived from a motif based method. The blocks of a signature, also called conserved regions, are composed of the key residues found among the homologues. These blocks are conserved during evolution because of their importance in protein functions. Biological experiments reveal that an enzyme catalytic site is usually constituted of residues that are largely separated in the sequence. To predict catalytic sites comprehensively, it is expected that the employed signatures must contain residues that are largely scattered in sequence. In this regard, we employ a recently developed pattern mining algorithm WildSpan for generating enzyme sequence signatures. WildSpan is well designed for discovering sequence motifs spanning a large number of unimportant positions. To measure the performance of our method, we collect the annotated catalytic sites for 831 enzymes from Catalytic Site Atlas (CSA). The results reveal that our method performs more effectively in identifying catalytic sites and catalytic residues than the patterns derived from PROSITE database. The proposed method has been realized in a web server named E1DS (http://e1ds.csbb.ntu.edu.tw/). E1DS currently contains 5421 sequence signatures that in total cover 932 4-digital EC numbers. In average, on the task of predicting catalytic sites, E1DS achieves a ‘correct’ rate of 35.5% and a ‘success rate’ of 49.6%, while the ‘correct’ and ’success’ rates of using PROSITE patterns are 18.9% and 33.7% respectively. On the other hand, on the task of predicting catalytic residues, the sensitivity rate of E1DS is 30.0%, better than that of PROSITE (16.2%), though the specificity rate of E1DS (96.7%) is slightly worse than that of PROSITE (98.6%).

參考文獻


1. Friedberg, I. (2006) Automated protein function prediction - the genomic challenge. Briefings in Bioinformatics, 7, 225-242.
2. Chandonia, J.M. and Brenner, S.E. (2006) The impact of structural genomics: Expectations and outcomes. Science, 311, 347-351.
3. Watson, J.D., Laskowski, R.A. and Thornton, J.M. (2005) Predicting protein function from sequence and structural data. Current Opinion in Structural Biology, 15, 275-284.
5. Tian, W.D., Arakaki, A.K. and Skolnick, J. (2004) EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference. Nucleic Acids Research, 32, 6226-6239.
6. Kasuya, A. and Thornton, J.M. (1999) Three-dimensional structure analysis of PROSITE patterns. Journal of Molecular Biology, 286, 1673-1691.

被引用紀錄


林培茵(2011)。嗜鹽菌Bacillus licheniformis NTU-01纖維素水解酶基因之選殖與表現〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2011.00121

延伸閱讀