透過您的圖書館登入
IP:3.141.44.183
  • 學位論文

WildPiRa:運用支持向量機分類器與同時保留模式提昇蛋白質核醣核酸結合殘基之預測

WildPiRa: improve the prediction of RNA-binding residues of protein sequence using support-vector machine and co-conserved motifs

指導教授 : 劉寶鈞

摘要


核醣核酸結合之蛋白質(RNA-binding proteins;RBPs)無論在後基因體時代的基因表現控制(gene expression)或者其他的生物過程中都扮演著重要的角色。因此,從蛋白質序列(protein sequence)中預測或識別出核醣核酸之結合殘基(RNA-binding residues; RBRs)為深入了解生物識別中重要的一步。由於目前核醣核酸與蛋白質交互作用的結構資訊依然非常稀少,因此有強烈的需求需要從蛋白質序列資訊直接預測RBRs。本論文主要提出一個新的複合型預測方法,稱為WildPiRa,主要結合限制行序列模式探勘演算法(WildSpan)所探勘之同時保留模式(co-conserved motifs)與以目前預測RBRs能力較佳之支持向量機(Support Vector Machine)分類器(PiRaNhA),最後將兩者預測結果透過各種不同之組合方法從蛋白質序列中預測RBRs。首先,我們使用117筆具有RNA-protein complexes的資料來個別比較WildSpan與PiRaNhA分類器, F-measure平均值分別為0.402及0.298,顯示 WildSpan比PiRaNhA有較優異的預測表現。當同時整合兩者之預測結果,其F-measure值從原有的0.402與0.298提升至0.509。總結來說,單使用WildSpan即能有效從蛋白質序列中預測RBRs,僅透過同源序列而無須依賴蛋白質交互作用結構,與機器學習分類器之比較並不遜色,尤其當WildSpan之同時保留模式整合預測分類器之預測結果,在預測效能上更能有效的提升。

並列摘要


The identification of RNA-binding residues (RBRs) in proteins is important in molecular recognition. In the absence of structures for RNA-protein complexes, it is strongly desirable to predict RBRs by protein sequences alone. In this thesis, we proposed a novel hybrid prediction method WildPiRa to tackle this problem, which combines co-conserved motifs discovered by WildSpan with the results predicted by a best SVM-based classifier PiRaNhA as we have known so far for identifying RBRs in protein sequences. The WildSpan and PiRaNhA are invoked to discover concurrently conserved patterns composed of multiple motifs spanning large wildcard regions in homologous sequences and to predict RBRs through trained classifier from protein sequence, respectively. Finally, both results are cooperatively used to identify RBRs in protein sequences by using several different combined methods that we proposed. We compare WildSpan, PiRaNhA, and WildPiRa on a dataset of 117 RNA-binding proteins in average; the predicting power of WildSpan using all of discovered co-conserved motifs achieves an F-measure of 0.402, which is better than an F-measure of 0.298 predicted by the structure-based trained classifier PiRaNhA. The performance of WildPiRa further improved the F-measure to 0.509 when both results are cooperatively integrated to identify RBRs in protein sequences. Conclusively, the efficiency of sequence-based WildPiRa is not only favorable in predicting complex-structure-unknow protein but also largely desired in large-scale proteomics.

參考文獻


36. 黃曉琪(民98)。使用保留序列探勘技術於蛋白質序列中核醣核酸結合區之預測。元智大學資訊學院資訊管理學系碩士論文,未出版,中壢,台灣。指導教授:張百棧。
1. Crick, F., Central dogma of molecular biology. Nature,1970. 227(5258): p.561-563
5. Noller, H.F. (2005) RNA Structure: Reading the Ribosome. Science , 309: 1508-1514.
6. Moore, M.J. (2005) From birth to death: the complex lives of eukaryotic mRNAs. Science , 309: 1514-1518.
9. Moore, M.J. (2005) From birth to death: the complex lives of eukaryotic mRNAs. Science , 309:1514-1518.

被引用紀錄


張雅婷(2012)。臺灣男女性平均死亡年齡之空間分析〔碩士論文,國立臺灣大學〕。華藝線上圖書館。https://doi.org/10.6342/NTU.2012.10370
李仁輝(2007)。經濟成長與國民死亡率之關聯─台灣地區之實證研究〔碩士論文,國立中央大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0031-0207200917343165

延伸閱讀