全基因體組關聯性分析 (Genome-Wide Association Study; GWAS) 的研究結果時常可見連鎖失衡區塊圖像不完整的現象。本研究欲探討連鎖失衡區塊圖像不完整的成因,是否和參試樣本不同次族群間起源不同的核苷酸變異有關。研究使用 15,919 個具有物理圖譜位置的基因座在 48 個玉米自交系中皆為同型結合基因型且沒有缺值的資料進行分析,此資料取自 MaizeSNP50 基因晶片的分析結果。首先將「參試樣本不同次族群間起源不同的核苷酸變異」定義為族群特有的分子標記位點。參試的玉米自交系不論由種原的來源記錄、STRUCTURE 軟體分析、或是 PCoA 分析方法都確定可以區分為溫帶馬齒種、溫帶甜質種、與熱帶種共三個次族群。藉由 STRUCTURE 軟體分群分析計算所得的 Q 值,建立次族群代表品系,並篩選出次族群特有的分子標記。其次「造成連鎖失衡區塊圖像不完整的分子標記」的篩選。將具完整圖像的連鎖失衡區塊定義為三個以上相鄰分子標記之間任一成對分子標記間相關係數 R 2大於 0.8 的染色體區間。造成連鎖失衡區塊圖像不完整的分子標記即為連鎖失衡區塊內與兩側相鄰分子標記的 R 2皆小於 0.8 的分子標記。上述資料分析結果顯示,使用 24 個代表品系可篩選出 6,696 個「族群特有的分子標記」,其中有 195 個分子標記是被定義為「造成連鎖失衡區塊圖像不完整的分子標記」。為了檢測「造成連鎖失衡區塊圖像不完整的分子標記」與「族群特有的分子標記」是否具有關聯性,使用 bootstrap 的概念,重覆進行 10,000 次取樣,每次取樣為自 15,919 個分子標記隨機抽取 6,696 個分子標記,並檢視不同數目的「造成連鎖失衡區塊圖像不完整分子標記」被隨機取得的機率。檢測結果顯示要隨機取得至少 195 個「造成連鎖失衡區塊圖像不完整」分子標記的機率僅為 0.6%。此結果顯示「造成連鎖失衡區塊圖像不完整的分子標記」與「族群特有的分子標記」是具有相關性的。
In the genome-wide association study (GWAS), incomplete pattern of linkage disequilibrium (LD) blocks were often found. This study is to explore whether incomplete pattern of LD blocks is related to nucleotide variations from different subpopulations. Data containing genotypes of 48 maize inbred lines was obtained by the MaizeSNP50 BeadChip and was used to address the aforementioned question. A number of 15,919 single nucleotide polymorphic markers, also known as the SNP loci, with known physical positions at maize reference sequences, were selected because their genotypes were all homozygous and had no missing value among 48 maize accessions. The nucleotide variations from different subpopulations were defined as "the subpopulation-specific SNP loci". The 48 maize inbred lines used in the current study can be classified as three subpopulations: temperate dent, temperate sweet and tropical. This classification was consensus between the analyses of the STRUCTURE software and the PCoA analysis, as well as the original records attached to these inbred lines. The representatives of different subpopulations were selected based on the three Q values of the STRUCTURE software, which indicate the proportions of genetic components from each of subpopulations. The subpopulation-specific SNP loci were then defined as those showing DNA polymorphism solely in one particular subpopulation. Using the same genotype dataset, "the SNP loci making incomplete pattern of the linkage disequlibrium blocks" were selected independently. The linkage disequlibrium (LD) block was defined as a chromosome region containing more than three flanking SNP loci and the correlation coefficient between at least a pair of the SNP loci in the LD block was greater than 0.8. "The SNP locus making incomplete pattern of the LD blocks" was then defined as the locus in the LD block which had the correlation coefficient less than 0.8 with its flanking loci on both sides. The data analyses identified 24 representatives for three subpopulations, and a total of 6,696 "subpopulation-specific SNP loci". Among these "subpopulation-specific SNP loci", a number of 195 markers were also idetifined as "the SNP loci making incomplete pattern of the LD blocks". In order to test whether the "population-specific markers" and "the SNP loci making incomplete pattern of the LD blocks"are associated, a total of 10,000 resamplings by the bootstrap approach were made to build the probability mass function for the number of the randomly drawn SNP loci in the LD blocks. From each resampling, a number of 6,696 SNP loci were randomly drawn from 15,919 SNP loci, and then the number of the SNP loci sitting in the LD blocks was recorded. The result showed the cumulated probability to obtain at least 195 random SNP loci sitting in the LD blocks was 0.6%. This result inferred that "the subpopulation-specific SNP loci" and "the SNP loci making incomplete pattern of the LD blocks" are closely associated.