Non-synonymous cSNPs that occurred in the protein-coding region could potentially alter the amino acid composition and thus the functionality of gene product. Protein variants generated by these non-synonymous cSNPs are thought to be responsible for the diversity of human population and the cause of human diseases associated with inheritance. It is essential to identify these non-synonymous cSNPs/functional variants for functional genomic studies. In addition to direct genome sequencing methods, dbEST has been used for SNP discovery due to its rich information of millions ESTs. Previous SNP discovery methods from dbEST needed to include many ESTs per cluster, which greatly reduce the effectiveness of dbEST. Such approaches would be useful in identifying SNPs for polymorphic markers. We would like to identify these sequence variations as either disease association mutations or functional variants. Therefore, it is necessary to use alternative bioinformatic tools to extract these low frequency functional variants. We have used our CGI bioinformatic tools to data-mining functional variants from ESTs, especially low frequency non-synonymous cSNPs. We have established the CGI tools for comparative genomic studies in identifying novel human genes. With 9848 human reference protein as starting alignment scaffolds and 2.2 million human ESTs, we have now identified more than 50,000 potential functional variants. In addition, our approach can be used to validate and correct amino acid sequences of human reference proteins.