透過您的圖書館登入
IP:18.117.196.184
  • 學位論文

基因體選拔中兩種訓練集最佳化準則之比較

A Comparison of two criteria for training set optimization in genomic selection

指導教授 : 廖振鐸

摘要


雖然次世代定序 (Next Generation Sequencing) 技術目前可協助降低基因型獲得 (genotyping) 的成本,但表現型獲得 (phenotyping) 於育種領域的執行成本上仍是一大考驗。因此,基因體選拔 (genomic selection) 可藉由篩選出使預測準確率最大化的特定訓練集 (training set) 資料來降低該訓練集於表現型獲得所需的成本並建立預測模型。在基因體選拔的 過程中,較佳的訓練集能協助我們建立預測測試集數量性狀較為精準的模型。而從候選集 (candidate set) 篩選對應每組測試集 (testing set) 的最佳訓練集過 程中,本論文各採用以r-score和mspe-score作為目標函數 (objective function) 的基因演算法 (genomic algorithms, GA) 來求之。透過基因演算法選出的訓練集在表現型獲得後,將可用來估計測試集個體的育種價 (genomic estimated breeding values, GEBVs)。基因演算法中採用的目標函數r-score和mspe-score可分別由測試集的表現型值與育種價間的皮爾森相關係數 (Pearson's correlation) 與均方預測誤差 (mean squared prediction error) 推導而得。此外,本論文以Tropical rice和 rice44k兩組資料作為範例,並採用一般常見的皮爾森相關係數及均方根誤差來評估預測模型的準確度;其中,由於rice44k資料的水稻個體共含六種次族群 (subpopulations) 結構,在建模過程除了比較測試集已知及未知外,還需考量次族群的影響。

並列摘要


While genotyping has become more cost-effective due to next-generation sequencing technique, the cost of phenotyping is still an obstacle in plant breeding. Therefore, the determination of a training set plays an important role to the success of a genomic selection (GS) program. An appropriate training set can be employed to reduce the phenotyping cost and maximize the prediction accuracy of the breeding program simultaneously. In this study, two optimality criteria derived from Pearson’s correlation and the mean squared prediction error between the phenotypic values and their genomic estimated breeding values (GEBVs) of a testing set are proposed for the training set optimization. A genetic algorithm implementing the two optimality criteria is used to generate the desired training set. The chosen optimal training set is phenotyped, and the resulting phenotypic values together with the genotypic values are used to build a GS prediction model, which is then applied to estimate the GEBVs of the testing set. Pearson’s correlation and root-mean-square error (RMSE) are further used as the measures to compare the performance between the two optimality criteria. Real data analysis and simulation studies based on two rice genome datasets are carried out, and the results show that the two optimality criteria have almost the same performance.

參考文獻


[1] T. H. Meuwissen, B. J. Hayes, and M. E. Goddard. Prediction of total genetic value using genome wide dense marker maps.GENETICS, 157(4):1819–1829, 2001.
[2] S. Maenhout, B. De Baets, and G. Haesaert. Graph­based data selection for the construction of genomic prediction models.Genetics, 185(4):1463­75, 2010.
[3] E. L. Heffner, M. E. Sorrells, and J. L. Jannink. Genomic selection for crop improvement, CropScience, 49(1):1–12, 2009.
[4] A. J. Lorenz, K. Smith, and J. L. Jannink. Potential and optimization of genomic selection for fusarium head blight resistance in six­row barley, Crop Science,52(4):1609–1621, 2012.
[5] V. Wimmer, C. Lehermeier, T. Albrecht, H. J. Auinger, Y. Wang, and C.C. Schön. Genome wide prediction of traits with different genetic architecture through efficient variable selection.GENETICS, 195(2):573–587, 2013.

延伸閱讀