透過您的圖書館登入
IP:3.144.212.145
  • 期刊

比較淨最小平方法中選取最佳因子數準則的研究-以近紅外線光譜資料為例

Comparisons of Criteria for Choosing Optimal Number of Factors on Partial Least Squares-Using Example of near Infrared Spectroscopy Data

摘要


淨最小平方迴歸可用來解決資料中共線性問題。以淨最小平方法所求得新的因子中,我們會刪除一些僅帶有薄弱訊息的因子,而保留有用的因子來對y作迴歸求得最佳模式。至於要保留前面幾個因子數目,將是本研究所要探討的問題。本文中,選取最佳因子數目的5個準則分別為交叉驗證法、估計均方誤差法、外部驗證法、矯正殘差平方和法與變數轉換法。並利用模擬研究(米質研究中以近紅外線反射光譜所測定水稻之直鏈澱粉含量)來對這5個準則作比較,模擬研究是依照原始資料的特性來模擬產生資料,在此模擬中我們考慮不同的解釋變數個數、樣本大小及變方大小,發現在不同的準則中利用交叉驗證法能有效估計模式中最佳因子數目,並且交叉驗證法在各種情形下對模式中最佳因子數目的估計最為準確。

並列摘要


Partial least squares (PLS) regression could be used to solve the multi-collinearity problem of data. For all factors on partial least squares, we deleted some factors having less information and kept the useful factors to find the best model by regressing those factors on y. The purpose of this research is to decide the suitable criterion for choosing the number of fitting factors. The five criteria for choosing the number of fitting factors in this study are MSECV, MSEP, MSEE, MSERSS, and HERROR. Results from the performance of those criteria were compared by simulation study (using data from near infrared spectroscopy to analyze apparent amylose content of the rice). The simulated data were generated using the character of original data. On these simulated data we considered different number of explanatory variable, sample size, and variance. It was found that criterion MSECV could be useful to estimate the number of fitting factors in the model. In addition, in all conditions accurate estimation of the number of useful factors could be achieved by the MSECV.

延伸閱讀