所謂單一核甘酸多型性(SNP)指的是DNA序列中的核甘酸被另ㄧ個核甘酸所替換。由於單一核甘酸多型性序列資料的變異有限,加上資料量十分的豐富,很適合拿來當作人類疾病特徵的標記。若在啟動子區上所結合的轉錄因子有單一核甘酸多型性的話,就有可能會因為單一核甘酸型性而影響到正常轉錄作用,導致基因功能異常。絕大多數的癌細胞起源於基因突變,利用這種方法可以辨識出可能的癌症相關基因。另外單一核甘酸多型性出現在基因的非編碼區,可能操控基因的表現水平。所以本文透過尋找在基因序列的啟動子區所結合的轉錄因子上之單一核甘酸多型性,來辨識出可能的癌症相關基因。我們使用的目標基因是161個已知的癌症相關基因,啟動子區所選的序列是從基因轉錄起始位置到往前2000個鹼基之間的序列。最後的結果發現有25個癌症相關基因,在轉錄因子結合區上都有發現單一核甘酸多型性的特性,這個結果顯示有相當比例的癌症相關基因在TFBS上具有SNP。因此,未來我們可以利用檢測基因 TFBS上是否有SNP來預測該基因是否為可能之癌症相關基因。
A Single Nucleotide Polymorphism (SNP) is a single base substitution of one nucleotide with another in DNA sequence. As a genetic marker, SNP data can be used to capture human disease traits because of its abundance and low diversity. If transcription factor binding site (TFBS) on promoter has SNP will influence transcription and cause the gene to change abnormally. Most cancer cells originated from gene mutation, and possible cancer-related genes can be recognized utilizing this characteristic. The purpose of this thesis is to predict possible cancer-related genes by examining whether a gene has SNP on TFBS. Our target genes include 167 known cancer-related genes, the promoter sequences is located on -2000 bp to -1bp of transcription start site (TSS). We find that there are 25 cancer-related genes with SNP in the TFBS. This result shows that there are large percentages of cancer-related genes which have SNP on TFBS.