尋找快速又可靠的耐熱蛋白質分類演算法一直是許多研究者研究的課題,而Hurst指數可對分形時間序列進行描述與分析,支持向量機(Support Vector Machine,簡稱SVM)通用性較好,且分類精度高與分類速度快等優點,故本篇碩士論文首度提出結合Hurst指數與SVM演算法,試圖以非符號性的數值化特徵來分類高溫蛋白質。耐熱蛋白質資料集使用20組含耐熱蛋白質與同源相對低溫蛋白質兩類之胺基酸序列,經計算每蛋白質中非符號序列的Hurst指數,可以得到代表每個存在於蛋白質序列中的四項特徵H值,將四項Hurst指數使用SVM以五褶與Leave-one-out交互驗證來計算其反應範圍之準確性。研究的結果顯示使用此方法能有效的進行高溫蛋白的分類。
In search of fast and good classification algorithm of thermostable proteins is an important issue. The Hurst exponent was able to analysis and description in fractal time series. Besides, the Support Vector Machine (svm) has advantage of high-precision and high-speed in classified the difference category data. In this study,we assay to classifying the non-symbolic sequence of the thermostable proteins by using Hurst exponent and SVM Classifier. A thermostable proteins data set with two classes was obtained from the Protein Data Bank (PDB). The sample included 40 instances, 20 instances are thermostable,and the other 20 instances are mesophilic proteins. Computing the Hurst exponent of each non-symbolic sequences of the proteins, we can obtained four feathres represented as hurst exponents respectively in each sequences of the protein. These data with four features of Hurst exponent is applied to evaluate the performances of the SVM algorithm by using 5-fold and Leave-one-out Cross-Validation method to compute the accracies of the response category variable. The research result showed this method to be able effective to carry on the high temperature protein the classification.