我們利用隱藏式馬可夫模型【Hidden Markov model (HMM)】來建立一個模型,藉由蛋白質之序列及結構特徵,進行預測及分類耐熱性蛋白質之研究。本研究利用21組耐熱蛋白及同源相對低溫蛋白質的胺基酸序列為訓練資料(training data),應用其胺基酸親水、疏水特性及在蛋白質三級結構中的摺疊位置,蛋白質摺疊資料則分別由ASA View資料庫中取得,而蛋白質摺疊方式共分成兩組進行分析,第一組以門閥值0.5做為摺疊位置之內外側之判斷值,第二組以摺疊之平均數做為摺疊位置之內外側之判斷值;各再透過Hidden Markov model (HMM)建立二個模型,來進行蛋白質耐熱性之分類與預測工作。並且利用條件機率於資料中分析發現部分耐熱蛋白及一般蛋白確實有蛋白序列程度上的差異,可以作為分類耐熱蛋白的參考依據。由於HMM具有良好的數學基礎及理論架構,透過本研究我們能有效的進行耐熱蛋白辨別及分類。
A model was constructed using Hidden Markov Model (HMM) to predict and classify thermostable proteins through the characteristics of their sequences and three-dimensional structures. In this study, 21 groups of proteins with their three-dimensional structures were obtained from the Protein Data Bank (PDB), and each group contains a thermophilic /mesophilic pair. Some of the sequences of these proteins were used as the HMM training data. The residues of these proteins were divided into hydrophobic and hydrophilic amino residues. The solvent accessibility percentages of the residues were obtained using the ASA View data base. A solvent exposed or solvent-buried residue was dependent on the percentage, which is the threshold of this study. Two groups were used in this study: one is a threshold of 0.5 as the judgment value for the exposed or buried position, the second group regards folding average as the judgment value for outside or within the folding position. According the collection data, a model was built using HMM to carry out a classification and prediction work of the heat resistance proteins. The collected data were analyzed through a condition probability, it can be found that there is a great difference in the sequences between the thermophilic and mesophilic proteins. Based on this finding, HMM may be a good tool to be used as the reference basis of classifying thermostable proteins. Because HMM is based on a good mathematics and theory, we can effectively distinguish the thermostable proteins and carry out the protein classification through this research.