透過人類體內激酶所催化的病毒蛋白質磷酸化在促進宿主細胞的複製和抑制或者一般的功能上扮演關鍵的調控角色。由於其在於生物上的重要性,我們期望能夠識別該病毒的磷酸化位置與其催化的人類激脢。然而在使用質譜儀為主的實驗相當的昂貴與費力。此外,以往識別病毒磷酸化位置的研究並不包含其相關的人類激脢辨識。因此,我們有動機的提出了一種新的方法來計算式的識別病毒磷酸化位置及其人類催化激脢。實驗驗證的磷酸化資料是由virPTM所提取出來,一個包含301個實驗驗證的磷酸化資料從104種經人類催化激脢磷酸化病毒蛋白的資料庫。為了試圖調查病毒磷酸化位置的motif,我們採用maximal dependence decomposition (MDD)的方法將一個大群磷酸化的資料分群成數個擁有顯著的保守區段motif的子群。另外,人類磷酸化位置群聚、群組乃根據其催化激脢的註釋,並且與病毒的MDD集群相匹配。隱藏式馬可夫模型(Hidden Markov Model)被用來讓各個子群學習預測模型。最後我們在模組上使用5-fold交叉驗證評估得到的正確率分別為絲胺酸84.93%、蘇氨酸78.05%,而使用independen test所得到的正確率為絲胺酸66.90%、蘇氨酸80.90%。本篇研究調查是第一個探索病毒磷酸化的潛在催化激脢,並且用這方法實做了一個網站viralPhos。http://csb.cse.yzu.edu.tw/ViralPhos/
Virus protein phosphorylation catalyzed by human kinases plays crucial regulatory roles in enhancing replication and inhibition or normal host-cell functions. Due to its biological importance, there is a desire to identify virus phosphorylation sites and its catalytic human kinase. However, the use of mass spectrometry-based experiments is proven to be expensive and labor-intensive. Furthermore, previous studies which have identified phosphorylation sites in viruses do not include the identification of the responsible human kinase. Thus, we are motivated to propose a new method to computationally identify virus phosphorylation sites and its catalytic human kinase. Experimentally verified phosphorylation data were extracted from virPTM – a database containing 301 experimentally verified phosphorylation data on 104 human kinase-phosphorylated virus proteins. In an attempt to investigate various motifs in virus phosphorylation sites, maximal dependence decomposition (MDD) is employed to cluster a large set of phosphorylation data into subgroups containing significantly conserved motifs. Furthermore, human phosphorylated sites are collected, grouped according to its kinase annotation, and matched with the virus MDD clusters. Profile hidden Markov model is then applied to learn a predictive model for each subgroup. A 5-fold cross validation evaluation on the MDD-clustered HMMs yields an average accuracy of 84.93% for Serine, and 78.05% for Threonine. Furthermore, an independent test yields an accuracy of 66.90% for Serine, and 80.90% for Threonine. This investigation is the first to explore potential kinases for viral phosphorylation sites. The method is implemented as a web server named, viralPhos, which is freely accessible at http://csb.cse.yzu.edu.tw/ViralPhos/.