本論文提出了以深度學習 (Deep Learning) 來解決辨識短縱列重複序 列 (STR) 的問題。在次世代定序 (NGS) 出現後,有許多辨識短縱列重 複序列的方法提出。但過去的方法有些對於複雜的重複序列較無法做 辨識,有些則使用的彈性較差,最近盛行的深度學習在影像處理與文 字處理都有不錯的表現,而 DNA 序列則是可以用影像或是文字的方 式去詮釋。因此本論文提出以深度學習的方式來辨識短縱列重複序列, 我們採用了卷積神經網路的架構以及從序列到序列的架構去辨識短縱 列重複序列。而由實驗結果顯示出我們的方法是有效的,在另一方面, 我們的方法也可以有彈性去學習新的資料,而不必如之前的方法去重 新大幅度的調整模型。
This paper proposes to use Deep Learning to solve the identification of the short tandem repeat (STR) problem. After the emergence of the next generation sequencing (NGS), there are many ways to identify short tandem repeats. However, some methods in the past are less able to identify com- plex repeat regions, while others are less flexible. The recent deep learning has a good performance in image processing and text processing, and DNA sequences can be interpreted by images or text. Therefore, this paper pro- posed to identify short tandem repeats in the deep learning algorithms, and we designed the convolutional neural network(CNN) architecture and sequence- to-sequence(Seq2Seq) architecture to address the problem. The results of the experiments in this paper show that our method is effective. On the other hand, our method is also flexible to learn new materials, instead of substan- tially readjusting the model.