共線性資料下不同統計方法之預測能力的模擬研究

當要同時建立多個解釋變數與反應變數間的關係時，最簡單的方式是以複迴歸來處理，但是當解釋變數數目遠多於樣本數，或者變數之間具有高度相關性時，會使得迴歸係數的估計變得不穩定甚至無法求得，這樣的資料又稱為共線性資料，在面臨這樣的資料時可以利用變數選取的方式降低資料組的維度，或者以脊迴歸、主成分迴歸及淨最小平方迴歸等方法來滅種共線性的干擾。本文利用模擬方式產生具有不向共線性程度的資料組，並比較複迴歸、前進選取法、脊迴歸、主成分迴歸與淨最小平方迴歸等統計方法的預測能力，發現在高共線性資料組中，所有方法的預測能力均下降，而在相同共線性資料中，則以主成分迴歸與淨最小平方迴歸表現較佳。此外，本研究中發現主成分迴歸與淨最小平方迴歸的預測能力相同，這可能是因為以模擬方式生資料組，資料組中干擾來源單純所致。

關鍵字

淨最小平方法；共線性；預測能力

並列摘要

Multiple linear regression (MLR) is the simplest way to build the relationship between the several explanatory variables and response variable. However, MLR often causes the estimated regression coefficients unstable or even unaccessible when the number of explanatory variables is more than that of objects. Especially, the explanatory variables are highly intercorrelated which called multi-collinear data. Therefore, many methods have been developed to analyze multi-collinear data, such as variable selection methods, ridge regression (RR), principal components regression (PCR) and partial least squares regression (PLSR). This paper presented the generated data sets of different degrees of colliearity to compare the prediction ability of different methods, including IVILR, FVS, RR, PCR, and PLSR. It was found that the prediction ability of these methods was decreasing when the degree of colliearity was increasing. PCR and PLSR had the best performance when the degree of colliearity was fixed. In addition, the reason for PCR and PLSR having the same prediction ability may be the simple interference of simulated data set.

並列關鍵字

Partial least squares ； multi-collinear ； prediction ability

國際替代計量

共線性資料下不同統計方法之預測能力的模擬研究

全文下載

主題瀏覽