摘要 Internal Ribosome Entry Site (IRES) 是一段具有特殊結構與功能的 RNA 序列,通常位於序列 5’ 端的不轉譯區 (untranslated region, UTR),某些病毒的 IRES 則是位於兩個基因之間的區域 (intergenic region, IGR)。通常在真核生物中,僅可以在 mRNA 的5’端開始啟動轉譯,而 IRES 允許轉譯啟動於 mRNA 中間部分。RNA 二級結構在演化過程中往往是守恆的,本篇論文使用生物資訊工具分析 RNA 二級結構,再配合已知的 IRES 結構,希望可以提供一種方法用於搜尋潛在的 IRES 序列。本研究串接現有的生物資訊工具,建構 IRES 序列的預測系統。系統首先利用 RNALfold 軟體,預測找出一級序列內小片段的穩定二級結構,再使用 RNA Align 軟體,將預測的結構比對已知的 IRES 結構,藉由比對二級結構的相似度,尋找出可能的 IRES 序列,並且結合使用 pknotRG 程式預測 Pseudoknot 結構輔助系統判斷。目前已知的 IRES 被分為四大類 (CrPV、HCV、EMCV和PV),使用 Rfam 資料庫中四種 IRES 的守恆二級結構當作標準模板進行比對,在分析過基因與已知的 IRES 序列之後,經由統計計算 IRES 搜尋系統對於四種 IRES 類型的預測準確率都可以達到 80% 以上。
Abstract The Internal Ribosome Entry Site (IRES) is a special structure of RNA sequences usually located in the 5'-untranslated region (UTR). Some virus IRES are located in the area between two genes, commonly known as the intergenic region (IGR). IRES can initiate a translation mechanism that is different from the traditional. The IRES has the ability to allow translation to start along the IGR of mRNA. In this study, we used Bioinformatics tools to analyze the RNA secondary structure and compared it with the known structure of the IRES sequence in order to provide a method for searching potential IRES sequences. We linked already existing Bioinformatics tools to construct an IRES sequence prediction system. First, we used RNALfold software to predict possible stable secondary structures within the sequence. Then we used the RNA Align software to align the predicted structure and known IRES structure. From there, we calculated the similarity of the two secondary structures to determine if the predicted sequence is a potential IRES sequence. Currently known IRES sequences are classified into four types: PV, EMCV, HCV, and CrPV. We used the conserved secondary structure of all four types from the Rfam database as template and compared them with virus gene sequences and virus IRES sequences. Through statistical analysis, the accuracy of prediction was found to be more than 80% for all IRES types. This demonstrates the efficiency of our system in predicting potential IRES sequences.