透過您的圖書館登入
IP:18.118.32.213
  • 學位論文

蛋白質區域保留結構片段之分群編碼研究

Amino acid Fragment encoding for structural preserving clustering

指導教授 : 陳中明

摘要


21世紀初,人類基因體定序完成,大量的基因序列資料出現,使得傳統醫學得以運用新的生物資訊觀點切入,利用資訊科學輔助資料分析使得生物醫學的研究能夠更具正確性與安全性;然而,研究基因層級對於醫療方法的實用性並不高,真正參與生物性程序的往往是基因所表現的蛋白質,蛋白質利用其結構上一定的構形,致使有其特定的功能,一般相信結構和功能間有密切的關係,因此生物資訊學者常歸納結構相似的蛋白質結構,將這些推論可能擁有相似功能的蛋白質結構一起做深入的探討。 在生物學上的觀點上,許多蛋白質結構間具有一些特定的保留結構,相關研究中指出這些保留結構通常具有特定的功能,因此也可以解釋不同的蛋白質,可以擁有相同或相似的功能;因此利用保留結構的觀點,我們希望可以在蛋白質結構的資料庫中,找出特定的重複性保留結構、歸納這些保留結構間相互的差異與替代關係;更進一步,我們還可以將這些重複性的結構進行編碼,將新產生的編碼套用到蛋白質鏈上以取代傳統的氨基酸序列,我們將新的序列命名為區域結構碼序列,因此,新的區域結構碼序列使得蛋白質鏈的序列得以更具有結構上的意義,增進結構與序列之間的關聯性。 本論文中,我們從蛋白質子資料庫PDB-REPREDB以15%序列相似度取出1738個蛋白質鏈,以4個殘基(4 mer)等長度切割蛋白質鏈而成四元片段的片段資料庫,再利用數個四元片段間的幾何特徵,用fuzzy c means clustering將四元片段分成30個分群,並且在初步分群後再經過優化過程而得最後結果並給予30個分群編碼而成區域結構碼(alphabet code)。為了驗證區域結構碼的觀點與分群結果正確,我們再透過兩個個案研究,個案一取出兩個序列相似度不高但結構相似的蛋白質鏈,個案二為著名的結構分類資料庫SCOP families中的16個蛋白質鏈,藉此探討此兩個案區域結構碼與結構相似度的關係,最終的研究結果顯示區域結構碼序列的確可以有效並正確表現結構之間相似度的關係。

並列摘要


At the beginning of 21st century, the Human Genome Project (HGP) has completed sequencing of human genome. The huge amount of genomic sequence data has revolutionized the studies of conventional medical science from the viewpoint of bioinformatics. The safety and correctness of the studies of medical science has been greatly improved by analyzing these data with the aid of computer science. However, researches in genomic level are potentially less practical than those in protein level in terms of further applications to clinical uses. It is because what actually participate in biological processes are mainly proteins. It is commonly believed that protein structures are highly correlated with protein functions. Generally speaking, proteins of the similar functions usually have the similar structures. Biologists, thereby, often cluster similar structures together and infer a function from these similar structures. Many protein structures share some specific conserved structures. It has been shown in many researches that these conserved structures exhibit some particular functions. With the concept of conserved structures, we aim to find out repeated conserved structures from protein structure database and analyze the substitutional relations among them. Furthermore, we can encode these repeated conserved structures. These new codes are endowed with more structural information than the amino acid codes. We name these new codes - alphabet codes, which naturally connect sequence to structure. In this study, we picked 1738 protein chains form protein structural database-PDB-REPRDB. All protein chains were decomposed into 4-mer fragments in a overlapping fashion, each of which is called a “quadripeptide”. Using the geometrical properties of these quadripeptides, we clustered them into 30 clusters with fuzzy c means clustering algorithm, refined the results of clusters, and encoded clusters into 30 different alphabet codes. Two case studies have been carried out to verify the effect of clustering and alphabet codes. In Case 1, we picked two protein chains which are similar in structures but different in amino acid sequences. In case 2, we picked 16 protein chains from a family of SCOP database. The results suggested that alphabet codes can characterize the structural similarity between two protein chains more effectively and informatively than the amino acid codes.

參考文獻


[1]. Akutsu T. (1994). Efficient and robust three-dimensional pattern matching algorithms using hashing and dynamic programming techniques. Proc. 27th Hawaii International Conference on System Sciences. 5, 225-234.
[2]. Schwartz J.T. and Sharir M. (1987). Identification of partially obscured objects in two and three dimensions by matching noisy characteristic curves. Int. J. Rob. Res. 6, 29-44
[4]. Tendulkar A.V., Wangikar P. P., Sohoni M.A., Samant V.V. and Mone C.Y. (2003). Parameterizationand classification of the protein universe via geometric techniques. J. Mol. Biol. 334, 157–172
[5]. Tendulkar, A. V., Joshi, A. A., Sohoni, M. A. and Wangikar, P. P. (2004). Clustering of Protein Structural Fragments Reveals Modular Building Block Approach of Nature. J. Mol. Biol. 338, 611–629
[6]. Bystroff C, Baker D. (1998). Prediction of local structure in proteins using a library of sequence-structure motifs, J. Mol. Biol. 281, 565- 577

被引用紀錄


陳暘文(2006)。區域結構碼序列在蛋白質穿針引線法上的應用〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2006.00352

延伸閱讀