透過您的圖書館登入
IP:3.147.89.85
  • 學位論文

應用廣義成分方法於高維度平均值對等性檢定

A Study on Applications of the Generalized Component Approach to the High-dimensional Equivalence Test of Means

指導教授 : 劉仁沛

摘要


一般來說,我們使用Hotelling T2是為了檢定兩母體平均向量間是否存在顯著上的差異。然而在高維度的資料下,資料可能會出現變項個數遠遠超過於樣本數的狀況,這會導致Hotelling T2的值無法被算出來,其中的原因出在它的共變異數矩陣是不可逆。針對此問題,統計學家們紛紛提出各自的作法。由於過去提出的方法均是平均向量間差異的顯著性檢定,因此在這篇論文中,我們利用前人研究發展出兩個對等性檢定。 考量到有關高維度下對等性檢定的文獻相當稀少以及由Chiu (2016)提出的最大Z2檢定只看最大的變項差異。我們提出來的兩個檢定都有考慮到每一個變項的差異。第一個檢定是應用Gregory, et al. (2015)所提出來的方法把它應用在對等性的假設檢定,我們把它命名為廣義成分對等性檢定,由於該方法是只考慮到變項與變項間的距離大小,而未考慮差異的方向性。有鑑與此,我們提出第二個檢定也就是複合共變量對等性檢定。 在此篇論文裡,我們將提供在各種條件及組合下的模擬結果。另外我們也會針對提出來的方法做實際資料的分析。根據結果發現,複合共變量對等性檢定不但能有效的控制型一誤差,同時檢定力也相對比較高。

並列摘要


Traditionally, the Hotelling T2 test is applied to detect the difference in mean vectors between two populations. However for the high-dimensional data where the number of variables (p) is greater than the sample size (n), the inverse of the covariance matrix does not exist, and hence the Hotelling T2 statistic can not be calculated. Various methods were proposed to resolve this issue for the high-dimensional data. However these methods are to test the difference, not equivalence in mean vectors between two populations. The literature on the average equivalence in high-dimensional data is scarce. Chiu (2016) first applied a supremum-based method by Cai, et. al. (2014) to high-dimensional average equivalence problem. However, Chiu’s method depends upon only the variable with the largest difference and ignore the information provided from the rest of other variables. In this thesis, we proposed two equivalence tests for high-dimensional data. The first method, the General Component Equivalence Test (GCET) extended the procedure by Gregory (2015) to the high-dimensional average equivalence problem. Since the GCET is the average of squared t-statistics of p-variables, it ignores the directions of mean differences. To outcome this shortcoming, we further propose the Compound Covariate Equivalence Test (CCET). Extensive simulation studies were conducted under various conditions to investigate the performance on the size and power of the two proposed methods. Simulation results reveal that the CCET not only control the size at the nominal level but also can provide sufficient power. A numerical example illustrates applications of the proposed methods.

參考文獻


Altman, D. G., & Bland, J. M. (1995). Statistics notes: Absence of evidence is not evidence of absence. British Medical Journal, 311(7003), 485.
Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd Edition, New York, Wiley: 156-163.
Cai, T., Liu, W., & Luo, X. (2011). A constrained ℓ 1 minimization approach to sparse precision matrix estimation. Journal of American Statistical Association, 106(494), 594-607.
Chen, S. X., & Qin, Y. L. (2010). A two sample test for high dimensional data with applications to gene-set testing. The Annals of Statistics, 808-835.
Chow, S. C., & Liu, J. P. (2009). Statistical assessement of biosimilar products. Journal of Biopharmaceutical Statistics, 20(1), 10-30.

延伸閱讀