透過您的圖書館登入
IP:44.220.182.198
  • 學位論文

作物全基因組選拔統計方法之探討

A Study on Statistical Methods for Genomic Selection

指導教授 : 廖振鐸

摘要


隨著全基因組選拔 (Genomic selection ,GS) 在作物育種上的應用越來越盛行,全基因組選拔不像傳統的作物育種基於外表型值做選拔,而是基於基因組估計的育種價估計值(GEBV)。全基因組選拔的主要思想是使用整個基因組上的密集的DNA標記捕獲數量性狀基因座(QTL)。用於作物育種中預測育種價估計值最常見的DNA標記為單核苷酸多態性(SNP)標記。對測試族群 (test population) 中個體的育種價估計值由訓練族群 (training population) 建立的統計模型來預測。全基因組選拔中常用的統計模型有兩種。首先,用於估計所有標記效應的全基因組回歸模型 (Whole-genome regression model),然後從迴歸模型獲得育種價估計值。然而,估計所有未知迴歸係數是具有挑戰性的,因為標記效應的數量通常遠大於觀察到的外表型值的數量,目前已經提出了許多統計方法來解決這個問題,可以分為兩類:壓縮估計法 (Shrinkage estimations) 的脊迴歸 (Ridge regression) 跟拉索迴歸 (Least absolute shrinkage and selection operator regression) 以及貝式估計法 (Bayesian estimations) 的三種方法貝式 A、B、C (Bayes A, B, C)。其次,線性混合模型 (Linear mixed effect model),其中標記效應被視為隨機效應 (random effects) ,然後通過標記效應的BLUP(最佳線性無偏預測值)估計育種價估計值,rrBLUP屬於此類估計方法。有幾個因素可能會影響全基因組選拔的預測準確性,這些因素包括標記效應的估計方法,數量性狀的遺傳力,SNP標記的密度,真實標記效應等。在本論文中,我們進行模擬研究以評估這些因素如何影響全基因組選拔的預測準確性。

並列摘要


Genomic selection (GS) becomes increasingly popular in plant breeding programs. Instead of using the conventional plant breeding, GS is based on genomic estimated breeding values (GEBVs) rather on phenotypic values. The main idea of GS is to capture quantitative trait loci (QTL) using dense DNA markers over a whole genome. The most common DNA markers used for the prediction of GEBVs in plant breeding are single nucleotide polymorphisms (SNPs). The GEBV prediction for the individuals in a test population is usually performed through a statistical model fitting the observed phenotypic values of a training population with their genome-wide SNP markers. There are two kinds of statistical models commonly used in GS. First, a whole-genome regression model, which is used to estimate all the marker effects, and then GEBVs are obtained from the fitted values of the regression model. However, it is challenging to estimate all the unknown regression coefficients, because the number of marker effects is usually much larger than the number of observed phenotypic values. Numerous statistical methods have been proposed to tackle this large-p-with-small-n problem, such as the shrinkage estimation such as ridge regression and LASSO; or the Bayesian estimation like Bayes A, Bayes B and Bayes C. Second, a linear mixed effects model, in which the marker effects are treated as random effects and a normal variance component is used to explain their variation. GEBVs are then estimated through BLUPs (best linear unbiased predictors) of the marker effects. The rrBLUP is in this spirit. There are several factors may impact the prediction accuracy of a GS. These factors include the estimation methods for marker effects, heritability of a quantitative trait, the density of SNP markers, the true marker effects, etc. In this thesis, we conduct simulation studies to evaluate how these factors affect the prediction accuracy of a GS. In addition, a rice genome data set is used for illustration.

參考文獻


DeLos Campos, G., &Perez Rodriguez, P. (2016). Package “BGLR” Title Bayesian Generalized Linear Regression.
Desta, Z. A., &Ortiz, R. (2014). Genomic selection: genome-wide prediction in plant improvement. Trends in Plant Science, 19(9), 592–601.
Endelman, J. B. (2011). Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. The Plant Genome Journal, 4(3), 250.
Friedman, J., Hastie, T., Tibshirani, R., Simon, N., Narasimhan, B., Qian, J., &Maintainer, ]. (2018). Package “glmnet” Type Package Title Lasso and Elastic-Net Regularized Generalized Linear Models.
Jannink, J.-L., Lorenz, A. J., &Iwata, H. (2010). Genomic selection in plant breeding: from theory to practice. Briefings in Functional Genomics, 9(2), 166–177.

延伸閱讀