透過您的圖書館登入
IP:3.141.41.187
  • 學位論文

FRESCO: Frequency-based RE-Sequencing tool based on CO-clustering segmentation for short reads - a case study of micro-RNAs

對短序列基於分群切段與計算頻率的重定序工具

指導教授 : 唐傳義

摘要


In this thesis, we propose a data processing pipeline of the Solexa machine and an algorithm of re-sequencing. According to the experimental protocol of Solexa machine, an adaptor was used for RNA sequencing and the adaptor sequence might contaminate the Solexa reads at the end. At first, we introduce a method of removing adaptor sequence and compare exact matching and one mismatching for adaptor sequence of this method. Then, a re-sequencing algorithm is proposed. Finally, we compare two different data partition methods within this algorithm. We also compare our algorithm with other re-sequencing tools. We show the results of chicken Solexa reads classified by Rfam RNA seed type tree.

並列摘要


在這篇論文中,我們提出了對Solexa資料處理的流程與一個重定序的演算法。根據Solexa機器的定序實驗原理,定序時會使用接頭(adaptor)序列,而當對核糖核酸(RNA)定序時,接頭序列會有很大的機會接在一個讀數(read)的末端,造成對該讀數的汙染。首先,這篇論文介紹了一種移除接頭序列的方法,並比較對移除的接頭序列有無容錯的去除情形。接下來,介紹這篇論文所提出的重定序演算法。最後,比較這個演算法中,兩種不同切割資料方法的差異並與其他現有的重定序工具作比較。另外,我們將雞胚胎的Solexa讀數與Rfam上的已知核糖核酸作比較,根據Rfam核糖核酸分類樹,統計其分類結果。

參考文獻


Altschul,S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.
Ferragina,P. et al. (2005) Indexing compressed text. Journal of the ACM, 52, 552 – 581.
Glazov,E.A. et al. (2008) A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res., 18, 957.
Griffiths-Jones,S. et al. (2005) Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res., 33, D121-D124.
Hillier,L.W. et al. (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature, 432, 695-716.

延伸閱讀