透過您的圖書館登入
IP:3.135.183.221
  • 學位論文

以牛頓法建立具顯性及缺失標識資料之遺傳連鎖圖

Linkage map construction with dominant and missing markers by Newton-Raphson method

指導教授 : 劉清

摘要


本研究重點為建立具顯性及缺失遺傳標識之連鎖圖譜(linkage map)。正確的基因連鎖圖譜將是影響數量性狀基因座(quantitative trait loci ,QTL)定位與分析結果的重要因素;假設基因連鎖圖譜未知,且基因組上佈滿遺傳標識因子,必須先將屬於同一連鎖群(linkage group)之標識因子區分為同一群,再利用最大概似法同時估算連鎖群上各標識因子間可能出現的互換率,由此可得標識因子最適的排列順序及標識因子間之遺傳距離,達到建立連鎖圖譜的目的。 當連鎖群上標識因子基因型訊息完全已知(fully informative)時,利用多基因座概似函數(multilocus likelihood)同時求算數個標識因子的互換率與利用兩點分析(two point analysis)個別獨立的計算結果相同;然而,當標識因子基因型訊息完全或部分缺失時,則只能以多基因座概似函數,根據連鎖群上所有標識因子的訊息來同時估計互換率。此時,概似函數相當複雜且無法寫出封閉解(closed form solution),只能藉由數值方法如牛頓法(Newton-Raphson method)、EM法(Expectation Maximization)等,運用遞迴(iterative )運算求得近似解。EM演算法因為沒有利用到概似函數之二次微分式,因此收斂速度較慢,並且無法求出最大概似估值之漸近變異矩陣。本文以牛頓法同時求出各標識因子間的互換率之最大概似估計值,因為此法不旦能得到最大概似估值(maximum likelihood estimate),同時亦能得到最大概似估值之漸近變異矩陣(asymptotic covariance matrix),即可評估根據此估值所做推論的可靠性。 本文首先模擬產生BC子代與 子代之標識因子資料,利用牛頓法遞迴求解各標識因子間互換率之最大概似估值,再利用Haldane基因定位函數(Haldane’s mapping function)將估算出來的互換率轉換成遺傳距離,與先前給定的遺傳距離相比較,結果相當接近,且由漸近變異矩陣可知標準差(standard error)非常的小。此外,牛頓法與EM演算法估算各標識因子間互換率之最大概似估值之計算結果幾乎完全相同。

並列摘要


The purpose of this study is to construct the linkage map with dominant and missing markers. A correct and accurate gene linkage map is vital for mapping and analysing quantitative trait loci (QTL). If the gene linkage map is unknown for a sequence of markers in the genome, we have to firstly divide the markers in the sequence into linkage groups, and then determine the most likely order of markers and the distances between neighboring markers within a linkage group. This is done by maximum likelihood (ML) method. When markers within a linkage group are fully observed, using the multilocus likelihood function to simultaneously estimate the recombination frequencies for all markers is equivalent to using two point analysis to independently estimate recombination frequency for each pair of markers. However, when some markers are partially observed or missing, the only way to calculate the recombination frequencies of markers is to simultaneously estimate the recombination frequencies according to the information of all markers within a linkage group by multilocos likelihood function. Usually, the multilocus likelihood function is too complicated to have a closed form solution and we can only use numerical analysis methods such as Newton-Raphson or EM algorithms to derive an approximate solution by iteration. The EM algorithm does not use the second order derivatives of likelihood function, so the convergence rate is slower and is unable to calculate the asymptotic covariance matrix of ML estimates. This study simulated the backcross data and F2 intercross data, using the Newton-Raphson method to simultaneously calculate the ML estimates of the recombination frequencies of all markers within a linkage group. The Newton-Raphson method can get not only ML estimates but also the asymptotic covariance matrix of ML estimates, the latter enables us to evaluate the plausibility of our statistical inference based on ML estimates, and then applying Haldane’s mapping function to transform the estimated recombination frequencies into genetic distances. We found the calculated distances are similar to what we originally assigned. The asymptotic covariance matrix showed that the standard errors are pretty small. In addition, the results of ML estimates by the Newton-Raphson method are identical to those of the EM algorithm.

參考文獻


江欣容, 2003 有限差分近似法在數量性狀基因座定位上最大概似估計值變異矩陣估算之應用. 台灣大學碩士論文.
Bailey, N. T. J.,1961. Introduction to the mathematical theory of genetic linkage. Clarendon Press, Oxford.
Dempster, A. P., Laird, N. M. and Rubin, D. B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. Ser B, 39:1-38.
Jiang, C. J. and Zeng, Z. B., 1997. Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica, 101:47-58.
Lincoln, S. E. and Lander, E. S., 1992. Systematic detection of errors in genetics linkage data.

延伸閱讀