An External Memory Approach to Computing the Maximal Repeats Across Classes of DNA Sequences

This work presents an external memory approach to extract the maximal repeats from whole genome sequences with the statistics of these repeats across classes, where the definition of a class is determined from the statistics to be computed. A heuristic method consisting of a bucket-sort-like approach and the Chinese term extraction approach is adopted. The bucket-sorting method is used to sort the suffixes of DNA sequences stored in files, and the term extraction is used to extract maximal repeats by scanning the sorted suffixes while computing the statistics of these repeats. The statistics of these repeats across classes might be useful for sequence classification and species identification.

並列關鍵字

maximal repeat ； external memory ； genomic comparison

被引用紀錄

劉宣榮（2010）。中華民國專利之關鍵字歷史資料查詢系統〔碩士論文，亞洲大學〕。華藝線上圖書館。https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215465542

Budiansyah, A. (2010). Text Trend Analysis via Significant Term A Based on Indonesia News [master's thesis, Asia University]. Airiti Library. https://www.airitilibrary.com/Article/Detail?DocID=U0118-1511201215465544

國際替代計量

An External Memory Approach to Computing the Maximal Repeats Across Classes of DNA Sequences

全文下載

主題瀏覽