透過您的圖書館登入
IP:3.147.193.181
  • 學位論文

用深度學習來探討短縱列重複序列

Short Tandem Repeat Profiling in Deep Learning

指導教授 : 周承復
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


本論文提出了以深度學習 (Deep Learning) 來解決辨識短縱列重複序 列 (STR) 的問題。在次世代定序 (NGS) 出現後,有許多辨識短縱列重 複序列的方法提出。但過去的方法有些對於複雜的重複序列較無法做 辨識,有些則使用的彈性較差,最近盛行的深度學習在影像處理與文 字處理都有不錯的表現,而 DNA 序列則是可以用影像或是文字的方 式去詮釋。因此本論文提出以深度學習的方式來辨識短縱列重複序列, 我們採用了卷積神經網路的架構以及從序列到序列的架構去辨識短縱 列重複序列。而由實驗結果顯示出我們的方法是有效的,在另一方面, 我們的方法也可以有彈性去學習新的資料,而不必如之前的方法去重 新大幅度的調整模型。

並列摘要


This paper proposes to use Deep Learning to solve the identification of the short tandem repeat (STR) problem. After the emergence of the next generation sequencing (NGS), there are many ways to identify short tandem repeats. However, some methods in the past are less able to identify com- plex repeat regions, while others are less flexible. The recent deep learning has a good performance in image processing and text processing, and DNA sequences can be interpreted by images or text. Therefore, this paper pro- posed to identify short tandem repeats in the deep learning algorithms, and we designed the convolutional neural network(CNN) architecture and sequence- to-sequence(Seq2Seq) architecture to address the problem. The results of the experiments in this paper show that our method is effective. On the other hand, our method is also flexible to learn new materials, instead of substan- tially readjusting the model.

參考文獻


[1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Is- ard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
[2] B. Brinkmann, M. Klintschar, F. Neuhuber, J. Hühne, and B. Rolf. Mutation rate in human microsatellites: influence of the structure and length of the tandem repeat. The American Journal of Human Genetics, 62(6):1408–1415, 1998.
[3] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for sta- tistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
[4] F. Chollet et al. Keras. https://keras.io, 2015.
[5] K. B. Gettings, R. A. Aponte, P. M. Vallone, and J. M. Butler. Str allele sequence variation: current knowledge and future issues. Forensic Science International: Ge- netics, 18:118–130, 2015.

延伸閱讀