單一核苷酸多型性是一種存在於人類基因序列中的一種點突變。此種點突變具有一些獨特的特性。以現有的技術來說,此類型點突變容易取得,降低了研究的困難度。然而,礙於實驗成本的考量,分析全部現有的單一核苷酸多型性是很困難的。根據前人的實驗中,證實了少數具有代表性的單一核苷酸多型性足以掌握其它點突變。自此,衍生出了標籤單一核苷酸多型性選擇法。單一核苷酸多型性有助於生物領域中的關聯分析以及疾病基因定位。我們提出一個貪婪演算法來挑選能明顯區分樣本之間差異性的單一核苷酸。根據我們的實驗結果發現,我們的演算法有其實用性以及合理性。
A well-organized disease gene mapping using single nucleotide polymorphism (SNP) as genetic markers improves our knowledge and understanding of genetic diseases. SNP is a point mutation with low mutation rate. About 10 to 30 million single variants exist in human genome. Johnson et al. reported that SNPs in a gene are tended in a linkage relationship. This association leads to a redundancy and has cost effect. Therefore, tag SNP selection problem is to find SNPs that are sufficient to represent a given gene. Tag SNP selection is a necessary and practical problem for reducing genotyping cost. In this study, we developed a program with the tag SNP selection perspective to offer a constructive base demanded by disease gene mapping. Our objective is to find susceptibility loci capable of explaining population differences with minimal number of SNPs. We tested our greedy algorithm on the real data. Our experimental results showed that our program is able to find minimal number of SNPs. Moreover, the quality metric we offered has biological meaning. In other words, the algorithm we proposed for the tag SNP selection problem is not only acceptable for computer scientists but also helpful to biologists.