透過您的圖書館登入
IP:3.135.191.134
  • 學位論文

偵測連結雜訊以改善蛋白質序列分群品質

Improving quality of protein sequence clustering by noisy relationship detection

指導教授 : 陳倩瑜
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


蛋白質序列分群演算法對於蛋白質功能分析與預測都扮演著相當重要的角色。在蛋白質序列分群的問題上面,分群演算法利用同源遞移性,在找遠親關係的序列上面有不錯的表現,但也同時產生其他的問題:有一些多區域的蛋白質反而變成這些演算法的雜訊,使得分群的結果不如預期。如果我們可以偵測出這些帶有雜訊的多區域蛋白質,進而減少分群時的雜訊,相信對於提升分群演算法的準確度是有幫助的。本論文研究過去提出偵測多區域蛋白質的方法進行比較,並評估其在減少分群雜訊上是否有幫助。本論文進一步提出一種回饋分群結果的機制來偵測雜訊,初步結果顯示所找到的資訊對於去除資料中的雜訊關係是有幫助的。我們同時還發現過去所提出的雙連通節點方法,可以提供多功能蛋白質和多區域蛋白質預測一些有用資訊,但其與提昇分群品質的效能上卻沒有顯著的關連性,值得更進一步的研究。

並列摘要


Protein sequence clustering plays a quite important role in analyzing and predicting the functions and structures of proteins. With employing the transitivity of homology property, the state of the art protein sequence clustering algorithms are able to detect remote homologues, but at the same time turn some multi-domain proteins into noises, degrading the quality of clustering results. Thus, it is believed that detecting multi-domain proteins and blocking their transitivity during clustering will improve overall performance. This thesis studied two previously published methods of detecting multi-domain proteins and tested whether those proteins are really helpful in reducing the noises of clustering. We further proposed a mechanism of detecting noisy relationships based on cluster hierarchies in this thesis. The experimental results show that the information found by our approach is helpful in improving the quality of protein hierarchies. We observed that the proteins identified by the previously published methods present stronger correlation to the multi-domain or multi-functional proteins than the proteins identified by our approach, but it is concluded in this thesis that detecting multi-domain proteins is not apparently helpful in improving the clustering accuracy.

參考文獻


Altschul, SF, Madden, TL, Schaffer, AA, Zhang, J, Zhang, Z, Miller, W, Lipman, DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. (1997), 25(17), 3389-402.
Bairoch, A. & Apweiler, R. The swiss-prot protein sequence data bank and its supplement trembl in 1999. Nucleic Acids Res. (1999), 27 (1), 49-54.
Bairoch, A. & Boeckman, B. The swissprot protein sequence data bank. Nucleic Acid Research. (1992), 20, 2019-2022.
Chen, CY, Oyang, YJ, Juan, HF. Incremental Generation of Summarized Clustering Hierarchy for Protein Family Analysis , Bioinformatics. (2004), Vol. 20(16), 2586-2596.
Choi, JH, Choi, K, Cho, HG, Kim, S. Multiple Genome Alignment by Clustering Pairwise Matches. Lecture notes in Bioinformatics(2004), : 2nd RECOMB Comparative Genomics Satellite Workshop.

延伸閱讀