透過您的圖書館登入
IP:18.116.239.195
  • 學位論文

開發比較對照組和控制組之單核苷酸多型性的生物資訊系統

Establish a bioinformatic system to evaluate the differences in single nucleotide polymorphisms for case-control studies

指導教授 : 黃怡婷
共同指導教授 : 洪舜郁
若您是本文的作者,可授權文章由華藝線上圖書館中協助推廣。

摘要


人類基因體圖譜圖在2003年被定序出來,人類生命科學因此開闢了一個新紀元,也造就人類想要了解疾病和基因的關係,而且因高科技的快速發展,單核苷酸多型性 (Single Nucleotide Polymorphism, 縮寫為SNP) 鑑定價格愈來愈便宜,所以研究SNP與疾病間的關聯性正如火如荼的展開,也因此促成本研想要開發一個可同時將SNP的關聯分析呈現在一個檔案的系統,此系統並藉由色彩來標識顯著的SNPs,且提供相關分析的圖型檔以利研究者快速找出顯著的SNPs。 現有市面發表的軟體,例如Hapstats、SNPAnalyzer 2.0、SNPStats和SNPassoc,他們的輸入格式都是要將所有的SNPs基因形態按照SNP編號整理成一人一筆且分析結果為每個 SNP 一張報表,若要運用二種以上的統計分析方法去檢測 SNP的基因形態 (Genotype) 和對偶基因 (Allele) 的關連性分析時,對一組幾千個以上的 SNP 資料,解讀結果會很沒有效率。本論文設計一套對使用者較方便的系統,其特色為資料輸入分樣本檔和基因形態鑑定結果檔,輸出則將所有所需的統計量 (如列百分比、勝算等)及統計檢定(如卡方檢定、概似比檢定、Cochran-Armitage 趨勢檢定和費雪精確檢定)結果列於同一個檔案並以顏色呈現有顯著的結果,同時會產生顯著性的圖型以方便使用者選出有顯著的基因。

並列摘要


The human genome is completely determined in 2003. This results in a greater desire to understand the relationship between genes and diseases. As the high technology developed so fast, the cost for identifying the single nucleotide polymorphism (SNP) is reduced dramatically. The obstacle to study the relationship between SNP and diseases become less. Since there are often hundreds or thousands of SNP in a particular study, a system that can read and output data in a systematic way can speed up the identificantion of useful SNPs or genes. Many software including Hapstats, SNPAnalyzer 2.0, SNPStats and SNPassoc.have developed to help biologists to analyze the association between diseases and SNPS, However, the input and output of these software are not very user-friendly. This thesis designs a system that allows an easier input, delivers a more complete analysis and has a user-friendly interface for a case-control study. The input of this systems includes two files, sample data and genotype data. The output of this system contains two associations (genotype and allele), four test statistics (chi-square test, likelihood ratio test, Cochran-Armitage test and Fisher exact test) and plots of the corresponding significance.

參考文獻


1 Agresti, A. Categorical Data Analysis, Second Edition, New York: John Wiley & Sons. (2002)
2 Amos, C.I., et al. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nature Genetics 40, 616 – 622. (2008)
2 Balding, D.J., et al. A tutorial on statistical methods for population association studies. Nature Reviews Genetics 7, 781-791 (2006).
3 Broderick, P., et al. A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nature Genetics 39, 1315 – 1317. (2007)
4 Cordell, H.J. & Clayton, D.G. Genetic association studies. Lancet 366, 1121–1131 (2005).

延伸閱讀