現今已有許多研究學者提出多種多基因風險評分(Polygenetic risk score,PRS)的方法,但大部分的方法使用歐洲人口資料研究,導致該方法使用在非歐洲人口上的預測能力大幅下降,而後續有許多學者為了解決非歐洲人群的問題,提出了跨種族的多基因風險評分方法,多數為研究試圖整合跨種族之間連鎖不平衡(LD)、次要等位基因頻率 (MAF)、causal SNP 效應的跨群體相關性和遺傳力以提高準確性,但預測效果似乎仍然不理想。因此我們的目標為希望能改善多基因風險評分的預測並應用在台灣人群。 本研究先使用模擬生成資料,在設定不同的條件下,例如等位基因頻率和連鎖不平衡的數量,比較影響多基因風險評分預測的準確率的原因,以應用在優化跨種族多基因風險評分上。接者使用台灣人體生物資料庫(TWB)和歐洲人體生物資料庫(UKB),分別選擇了身高和青光眼作為數值型與類別型的表型,使用C+T, LDpred2, Lassosum, PRS_adj方法做跨種族的模型預測,我們在身高模型建構上嘗試加入父母的資訊,後續使用Duy Pham等人所提出的PRSUP架構利用UKB和BBJ的匯總統計嘗試調整PRS分數,希望得到更好的預測結果。 我們的研究結果顯示,在模擬資料的結果中,我們發現有考慮LD的多基因風險評分方法可以提升預測性能,且要放入較多的SNPs有更明顯的提升效果。在建構身高的預測模型時,我們嘗試加入父母的身高資訊,不論是在單一族群或是跨族群預測,在TWB和UKB資料中皆可以有效提升預測性能,R2可以增加0.002至0.01,TWB最佳R2達到0.9420,UKB最佳R2達到0.9830。在身高和青光眼的模型預測中,表現最好的都是有考慮LD的LDpred2, Lassosum方法,另外我們嘗試了Hao, L等人提出的PRS_adj跨種族PRS方法,在單一族群的預測上與C+T幾乎相同沒有差異,使用在身高跨種族的預測也未得到比C+T更好的預測結果,不過使用在青光眼疾病上,就有明顯的提升,不過需要納入較多的SNPs才有明顯的提升。最後,PRSUP調整參數的方法,不論是使用外部資料或是自己建構的GWAS匯總統計,結果皆未能改善預測性能
Nowadays, many researchers have developed a variety of polygenic risk score (PRS) methods, but most of the methods use European population data research, resulting in a significant decline in the predictive ability of the method used in non-European populations. To solve the problem of non-European populations, cross-ethnic polygenic risk scoring methods are developed, most of which try to integrate linkage disequilibrium (LD), minor allele frequency (MAF), and causal SNP effects between cross-ethnic groups to improve accuracy, but the prediction performance still seems to be suboptimal. Therefore, our goal is to improve the prediction of polygenic risk scores apply it to the Taiwanese population. This study uses simulated data to compare the reasons that affect the accuracy of polygenic risk score prediction under different conditions, such as allele frequency and the number of linkage disequilibrium, in order to optimize cross-ethnic PRS. The study used the Taiwan Biobank (TWB) and the UK Biobank (UKB), respectively selected height and glaucoma as the numerical and categorical phenotypes, and used the C+T, LDpred2, Lassosum, PRS_adj methods to do model prediction of cross-ethnic, we tried to add the information of parents to the construction of the height model and then use the PRSUP framework proposed by Duy Pham et al. to use the summary statistics of UKB and BBJ to try to adjust the PRS, hoping to get better prediction results. Our research results show that in the simulated data, we found that the polygenic risk scoring method considering LD can improve the prediction performance, and more SNPs should be put in to have a more obvious improvement effect. When constructing the height prediction model, we tried to add the height information of the parents. No matter whether in a single ethnic group or cross-ethnic prediction, the prediction performance can be effectively improved in TWB and UKB data, and the R-square can be increased by at least 0.002. In our height and glaucoma model prediction analysis, the best performance is the LDpred2 and Lassosum method that considers LD. In addition, we tried the PRS_adj cross-ethnic PRS method proposed by Hao, L et al., which is almost the same as C+T in the prediction of a single ethnic group. The use of cross-race prediction of height did not get a better prediction result than C+T. However, there was a significant improvement in our glaucoma disease analysis, but a lot of number of SNPs needed to be incorporated to achieve this significant improvement. Finally, PRSUP method considering adjusted parameters, whether using external data or self-constructed GWAS summary statistics did not improve predictive performance significantly.