利用隱藏式馬可夫模型來進行耐熱性蛋白質之分類與預測研究

我們利用隱藏式馬可夫模型【Hidden Markov model (HMM)】來建立一個模型，藉由蛋白質之序列及結構特徵，進行預測及分類耐熱性蛋白質之研究。本研究利用21組耐熱蛋白及同源相對低溫蛋白質的胺基酸序列為訓練資料（training data），應用其胺基酸親水、疏水特性及在蛋白質三級結構中的摺疊位置，蛋白質摺疊資料則分別由ASA View資料庫中取得，而蛋白質摺疊方式共分成兩組進行分析，第一組以門閥值0.5做為摺疊位置之內外側之判斷值，第二組以摺疊之平均數做為摺疊位置之內外側之判斷值；各再透過Hidden Markov model (HMM)建立二個模型，來進行蛋白質耐熱性之分類與預測工作。並且利用條件機率於資料中分析發現部分耐熱蛋白及一般蛋白確實有蛋白序列程度上的差異，可以作為分類耐熱蛋白的參考依據。由於HMM具有良好的數學基礎及理論架構，透過本研究我們能有效的進行耐熱蛋白辨別及分類。

關鍵字

隱藏式馬可夫模型（HMM) ；蛋白質耐熱性之分類與預測；耐熱性蛋白質；條件機率

並列摘要

A model was constructed using Hidden Markov Model (HMM) to predict and classify thermostable proteins through the characteristics of their sequences and three-dimensional structures. In this study, 21 groups of proteins with their three-dimensional structures were obtained from the Protein Data Bank (PDB), and each group contains a thermophilic /mesophilic pair. Some of the sequences of these proteins were used as the HMM training data. The residues of these proteins were divided into hydrophobic and hydrophilic amino residues. The solvent accessibility percentages of the residues were obtained using the ASA View data base. A solvent exposed or solvent-buried residue was dependent on the percentage, which is the threshold of this study. Two groups were used in this study: one is a threshold of 0.5 as the judgment value for the exposed or buried position, the second group regards folding average as the judgment value for outside or within the folding position. According the collection data, a model was built using HMM to carry out a classification and prediction work of the heat resistance proteins. The collected data were analyzed through a condition probability, it can be found that there is a great difference in the sequences between the thermophilic and mesophilic proteins. Based on this finding, HMM may be a good tool to be used as the reference basis of classifying thermostable proteins. Because HMM is based on a good mathematics and theory, we can effectively distinguish the thermostable proteins and carry out the protein classification through this research.

並列關鍵字

HMM ； thermophilic and mesophilic proteins Classification and prediction of the thermostable proteins ； condition probability ； thermophilic and mesophilic proteins

參考文獻

Liu,H., Li,G., William,G., Cumberland and Wu,T. (2005) Testing Statistical Significance of the Area under a Receiving Operating Characteristics Curve for Repeated Measures Design with Bootstrapping, Journal of Data Science 3, pp.257-278.

Shieh,J.I., Lee,K.J., Liu,H.C. and Tseng,H.Y. （2006） Evaluating the Structure Properties of DNA by Using the Spanning Tree Invariant of the Topological Markov Chain Model，統計測驗年刊，6月。

陳大新編著，矩陣理論，凡異文化事業股份有限公司，2003年5月初版。

Durbin,R., Eddy,S., Krogh,A. and Mitchison,G. （1998） Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.

Eddy,S.R. (1998) Profile Hidden Markov Models, Bioinformatics Review, Vol.14, pp.755-763.

被引用紀錄

黃信銘（2007）。利用支持向量機與Hurst指數分析法進行耐熱蛋白質之分類〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-0807200916271931

劉沂政（2009）。基於L測度之Choquet 積分迴歸模式與赫斯特指數之耐熱蛋白預測演算法〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215464053

國際替代計量

利用隱藏式馬可夫模型來進行耐熱性蛋白質之分類與預測研究

未授權

主題瀏覽