透過您的圖書館登入
IP:3.136.154.103
  • 學位論文

基因序列分段研究

Segmentation of DNA sequence

指導教授 : 趙坤茂 劉長遠

摘要


去氧核醣核酸(Deoxyribonucleic acid, DNA)序列中含有許多生物體 或病毒運作的基因密碼。根據分子生物學的中心教條,我們可以知 道DNA的片段在製作蛋白質時扮演重要角色。 當一個病毒在世界各地快速擴散時,我們最快可以得到的資訊便 是此病毒的DNA序列。若我們可以在短時間內將DNA序列分段,則我 們可以快速地分析他,更甚者,可以找到有用的藥物或疫苗來治療病 人。 在這篇論文中,我們用simple recurrent network(SRN)來預測DNA序 列中的下一個核甘酸(nucleotide),並藉此做DNA序列的分段。我 們用SARS-Cov, HCoV-EMC/2012, Ebola virus, and HIV 病毒來進行實 驗。實驗的分段結果和其他生物學家找到的片段是一致的。我 們也用Hinton diagram來做互信息(mutual information)的分析。我們知 道SRN可以用來學習基因序列的架構並藉由預測的錯誤來偵測coding regions的邊界。這顯示SRN可以提供一些有助於未來生物研究的特 徵。

並列摘要


DNA encodes the genetic instructions used in the development and functioning of all known living organisms and many viruses. And according to the central dogma that forms the backbone of molecular biology, we know that collection of segmented DNA plays an important role in making protein. When a virus spreading worldwide quite quickly, the first information we can get is its DNA sequence. If we can segment the DNA sequence in very short time, then we can analyze it fast, and moreover, find the useful drug and bacterin for patients in very short time. In this thesis, we use the simple recurrent network (SRN) to predict the next nucleotide of DNA sequence in order to do the segmentation task. SARSCov, HCoV-EMC/2012, Ebola virus, and HIV genomes are used in our experiment. The results are consistent with the finding of biologists. We analyze the mutual information with Hinton diagram as well. Therefore, we explicitly know that SRN can learn the genome structure and detect the boundaries of the coding regions according to the prediction errors. This implies that SRN is capable of providing features for further biological studies.

參考文獻


[1] Dennis A. Benson, Ilene Karsch-Mizrachi, David J. Lipman, James Ostell, and David L. Wheeler. Genbank. Nucleic Acids Research, 33(Database-Issue):34–38, 2005.
[2] Pedro Bernaola-Galv’an, Ram’on Rom’an-Rold’an, and Jos’e L. Oliver. Compositional segmentation and long-range fractal correlations in dna sequences. Physical Review E, 53(5):5181–5189, 1996.
[3] Richard J. Boys, Daniel A. Henderson, and Darren J. Wilkinson. Detecting Homogeneous Segments in DNA Sequences by Using Hidden Markov Models. Journal of the Royal Statistical Society. Series C (Applied Statistics), 49(2):269–285, 2000.
[5] Wei-Chen Cheng, Jau-Chi Huang, and Cheng-Yuan Liou. Segmentation of dna using simple recurrent neural network. Knowl.-Based Syst., 26:271–280, 2012.
[6] Shan Dong and David B. Searls. Gene structure prediction by linguistic methods. Genomics, 23:540–551, 1994.

延伸閱讀


國際替代計量