羥脯胺酸(Hydroxyproline)是脯胺酸經由羥基化反應後的產物,羥基化為蛋白質轉譯後修飾反應之一,此反應在生物體中扮演著重要的角色。脯胺酸與賴胺酸為發生此反應最主要的兩個胺基酸,當中又以脯胺酸的修飾占絕大多數,故本研究以分析脯胺酸的羥基化為目的。本研究將蛋白質種類分成三群來建立預測模型,分別為膠原蛋白(Collagen)類、芋螺毒素(Conotoxin)類與其他類。利用245條含有實驗驗證之羥脯胺酸的蛋白質序列,以脯胺酸為中心截取window size 21的序列片段做實驗與分析,使用WEKA中的隨機森林(Random forest)作為本實驗的分類器,並做五群集交叉測試。本研究所建立的Web tool命名為”Hyppred”,連同本研究之相關資料集公開於http://140.138.144.144/~migon/Hyppred。
Hydroxylation is one of the important post-translational modifications in proteins. Hydroxylation reactions are important in collagen synthesis and detoxification. The most commonly hydroxylated residues are proline and lysine. This study focuses on identification of the proline hydroxylation. Data were collected from UniProt database and grouped into categories of Collagen, Conotoxin and other proteins in which hydroxylation occurred. A sliding window of size 21 was used to examine features. A random forest approach and 5-fold cross-validation were used to construct and test the prediction model for proline hydroxylation. A Web site that hosts my prediction tool and data sets is available at http://140.138.144.144/~migon/Hyppred .