蛋白質二維結構與殘基環境類型之分析

此研究主要是探討如何從蛋白質殘基之物理與化學環境參數，如殘基之二維結構及水溶液表面面積等資訊，預測蛋白質三維結構之摺疊類型(class)。本研究中將計算SCOP資料庫中蛋白質分子之環境得分矩陣(score matrix)及建立其對應之結構分析表(structural profile)。蛋白質二維結構資料是從DSSP資料庫中飾選出來，再依據SCOP的類型作為我們資料輸入的分類。並藉由開發之程式作蛋白質序列與蛋白質結構(依殘基環境參數)比對，取樣係依殘基水溶液表面面積(埋沒(B)，部份埋沒(P) 和暴露(E))，和二維結構資料(α-helix、β-sheet和coil)，研究中使用九種環境類型(B、P、E)c。在計算每種SCOP種類的結構分析表的過程中，我們考慮下列簡化：(1)以單體蛋白質，(2)沒有雙硫鍵連結和(3)採取小於比率25%相似性的序列。採用25%的標準過濾相似性過高的序列，原因是避免重複計算他們對得分矩陣的重覆累計。以作為預測蛋白質序列之摺疊類型，經由此方法預測準確性將可達95%以上(平均得分 <0.5時)。最後將檢視此方法使用在相似性低於25%的蛋白質序列之三維結構預測之可能性。

關鍵字

蛋白質結構；環境得分矩陣；結構分析表；序列與結構比對；蛋白質摺疊類型預測

並列摘要

In this thesis, I investigated how the amino acids physicochemical environment information, such as the protein secondary structures and residues solvent accessibility, could possibly enhance one’s capability for protein classes classification prediction. The score matrices for several classes (all-, all-,  and according to the SCOP classification) of known protein sequences were computed. Sequences are taken from a protein secondary structure database, for example, the DSSP secondary structure protein databases. Thus, one can construct the 3D structure profiles for each entry in the PDB database. These profiles are used to score the query protein sequence to be modeled for compatibility with the known classes classification. To demonstrate the 3D structure profile method is able to detect sequences compatible with a known class, one aligns the query sequences with the environment of a known protein structure using a simple sequence alignment algorithm. My study indicated that the method has larger than 95% accuracy in protein classes assignment(average score <0.5). Furthermore, I had also established the fact that the structure profile approach is able to detect distant sequences well below the twilight zone (less than 25% sequence similarity).

並列關鍵字

protein structures ； environment score matrix ； structure profiles ； sequence and structure alignment ； protein classes classification prediction

參考文獻

Bowie J., Clarke N. D., Pabo C. O. and Sauer R.T. 1990. Identification of Protein Folds: Matching Hydrophobicity Patterns of Sequence Sets with Solvent Accessibility Patterns of Known Structures. Proteins, 7, pp.257-264.

Bowie JU, R. Luthy and Eisenberg, D. 1991. A method to identify protein sequences that fold into a known three-dimensional structure. Science, 253, pp.164-170.

Chang I., Cieplak M., Dima R., Maritan A. & Banavar J. 2001. Protein threading by learning. Proc. Natl. Acad. Sci., USA 987, pp. 14350-14355.

Fisher D., Rice D., Bowie J. U. and Eisenberg D. 1996. Assigning amino acid sequences to 3-dimensional protein folds. FASEB J. 10, pp. 126-136.

Frishman D. and P. Argos 1995. Knowledge-based secondary structure assignment. Proteins: structure, function and genetics, 23, pp.556-579.

國際替代計量

蛋白質二維結構與殘基環境類型之分析

未授權

主題瀏覽