細胞分子生物學的理論證實,轉錄因子的功用是控制其它基因的表現,也是控制基因序列在啟動子區是否會發生突變的關鍵。但目前這方面的研究大多是在啟動子區以預測Motif為主,使用一些演算法來預測是否為轉錄因子,如MEME、Genetic Algorithm(GA)、Gibbers Sampler。在本篇論文中,使用真實轉錄因子結合區來預測癌症基因,其效果會比一般Motif來得準確,故我們透過許多轉錄因子與轉錄因子結合區的相關網站及資料庫,收集人類基因序列中的已被發表證實的轉錄因子結合區,再從已知癌症組織基因序列的啟動子區統計分析出最有可能影響發生突變的轉錄因子。因為轉錄因子結合區與啟動子區的資料皆是從網路上著名網站的資料來分析,若有增加新的資料可以利用我們所開發的引擎再轉入資料庫並再得到新的結果。希望這些結果可以變成一種標記用在辨識任一基因序列是否含有可能的癌症相關基因,對於提升醫學上尋找癌症相關基因的速度與精確性將有莫大的助益。
The purpose of transcription factors (TFs) is to regulate the expression of other genes. They are also the key-point to control if mutation will occur on promoter region or not? Current researchers on TFs mainly focus on predicting motifs using algorithms such as Multiple Em for Motif Elicitation (MEME), Genetic Algorithm (GA), and Gibbs Sampler. In this thesis, we propose a new approach to predict possible cancer-related genes based on transcription factor binding sites (TFBS). The experimented TFBS that are binding on promoter region and the known cancer-related genes have been collected from TFSEARCH and CHIP websites, respectively. The TFBS that result in mutation of genes are selected. We then analyze the occurrence frequencies of these TFBS to investigate the relations of TFBS and possible cancer-related genes. We also discuss the two-factor case of analyzing the relations of two TFBS and possible cancer-related genes. Our results show that the TFBS-based approach for predicting possible cancer-related genes is a reliable method to recognize possible cancer-related genes.