在衛生署公佈的台灣民眾十大死因調查中,惡性腫瘤在連續數十年蟬連榜首,其中有近三成是罹患癌症。由於人類基因體定序大致上已經完成,如何運用資訊技術在這些龐大的資料中找尋出導致癌症發生的相關訊息就顯得特別重要了。生物學家已經證實,轉錄因子(Transcription Factor)的功用是控制其他基因的表現,也是控制基因序列在啟動子區是否會發生突變的關鍵。在一些比較著名的 DNA 調控序列中,大多具有一致性序列的特性,因此尋找motif一直都是生物學家們很有興趣的研究議題。過去有許多預測motif的方法被提出,這其中有關motif的長度設定可以說是能否正確預測motif的關鍵。 本論文提出使用快速傅立葉轉換預測motif的可能長度,並搭配資料庫軟體來處理序列資料,期望可以提高處理資料的效率與處理更大量基因序列的能力,達到提升預測motif區段的效率與精確度的目的。最後,在實驗方面,我們除了設計一系列模擬序列資料測試本方法預測motif的正確性,並且使用已知的致癌基因實際驗證本方法的有效性。實驗結果顯示本方法的提出徹底解決了過去在使用motif pattern預測軟體時,只能以試誤法或經驗法則設定可能motif長度的缺憾。
The biologists have proven that the function of transcription factor is to control gene expression and also gene sequence mutation. In several well-known DNA regulatory sequences, most of them have characteristic of consensus sequence; therefore, searching for motif interests biologists evermore. A great deal of numerous methods to predict motif patterns were published in the past, and among all the factors, the length of motif is absolutely critical in motif prediction. The purpose of this thesis is to predict possible motif lengths based on the method of Fast Fourier Transform (FFT). Simulation and known oncogene sequences are both used to verify the effectiveness of the proposed method. The results show that our method would improve the performance and accuracy of motif prediction algorithm significantly.