透過您的圖書館登入
IP:18.216.121.55
  • 學位論文

使用隱藏馬可夫模型預測microRNA之目標基因

Prediction of microRNA Target Genes Using a Hidden Markov Model

指導教授 : 莊曜宇
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


微核醣核酸 (microRNA) 是一種短片段的非編碼核醣核酸 (non-coding RNA) ,平均約22個核苷酸長,能在動物中扮演著抑制目標基因的蛋白質轉譯功能。然而,要預測動物microRNA的目標基因卻是有相當的困難與挑戰,原因在於大部分的microRNA和目標基因呈現非完整的互補。為了增加目標基因的預測效果,在此我們闡述一個嶄新的預測microRNA目標基因演算法,首先建立一個在預測上過去經常被使用到的模型,其包含藉著尋找互補序列來計算序列比對的分數,以及利用熱力學的方法計算microRNA和目標基因之間的自由能。除此之外,我們引進一種著名的機器學習方法-隱藏馬可夫模型 (HMM) ,作為這個預測模型最終結果的決策工具。但是基於HMM本身的限制,無法對整條序列資訊作全盤的考量,因此,在我們所提出的演算法中引入了一個突破傳統HMM限制的觀念-同時利用正向與反向HMM,應用此觀念將可以同時取得序列中任一元素往前一個元素和往後一個元素的資訊。   本篇論文將探討這些模型在不同的組合之下,個別之最大敏感性、特異性以及整體的正確率。此外,我們也設計了二種驗證的方式,分別為:利用一群由目前已經發表的其他演算法所整理出來的預測基因;和利用一群由微陣列晶片所找出來的低表現基因,來分別證明我們提出的演算法預測結果之好壞。根據實驗的結果,此演算法的最大敏感性、特異性和整體的正確率分別為84.25%、96.78%和96.67%。在比較其他的演算法和低表現的基因這二項驗證也分別有52.42%和70.37%的重疊率。

並列摘要


MicroRNAs (miRNAs) are short non-coding RNAs about 22 nucleotides that play important regulatory roles in animals for translational repression. Nevertheless, it is a difficult challenge to predict targets in animals because of their much more imperfect complementarity between microRNAs and mRNAs. In order to further improve the prediction performance, we propose a novel microRNA target-gene prediction algorithm which combines several conventional prediction models such as the sequence complementary searching for calculating alignment scores and thermodynamic stability approaches for assigning folding free energy to each microRNA-target interactions. Besides, it includes a Hidden Markov Model (HMM), which is a famous machine learning approach, to help the prediction decision. However, due to its innate limitation, HMM can’t consider all the global information of the sequences. Hence, in order to overcome this limitation, forward and backward HMMs are simultaneously utilized in the proposed algorithm. As a result, it can make any element information of microRNA-target interactions able to pass to any other element by bi-directions.   In this thesis, the author calculates the highest sensitivity, specificity, and overall accuracy in the different combination of the proposed models. And it also uses the predicted genes from existing prediction algorithms and down-regulated genes from microarray data to demonstrate the correctness of the proposed algorithm. According to the simulation result, the corresponding sensitivity, specificity, and overall accuracy are 84.25%, 96.78%, and 96.67%, respectively in the complete prediction models. And it is determined that 52.42% and 70.37% overlap rates predicted by the proposed algorithm also can be estimated in other existing prediction algorithms and the down-regulated results of microarray data, respectively.

參考文獻


1. Lagos-Quintana, M., et al., Identification of novel genes coding for small expressed RNAs. Science, 2001. 294(5543): p. 853-8.
2. Lagos-Quintana, M., et al., Identification of tissue-specific microRNAs from mouse. Curr Biol, 2002. 12(9): p. 735-9.
3. Lee, R.C. and V. Ambros, An extensive class of small RNAs in Caenorhabditis elegans. Science, 2001. 294(5543): p. 862-4.
4. Lee, R.C., R.L. Feinbaum, and V. Ambros, The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell, 1993. 75(5): p. 843-54.
5. Lee, S.S., et al., DAF-16 target genes that control C. elegans life-span and metabolism. Science, 2003. 300(5619): p. 644-7.

延伸閱讀