透過您的圖書館登入
IP:18.216.239.211
  • 學位論文

機率主成分分析於區間值資料之應用

The Application of Probabilistic Principal Component Analysis to Interval-valued Data

指導教授 : 吳漢銘

摘要


主成分分析法 (principal component analysis, PCA)是一個 常用的維度縮減方法, 在象徵性資料分析 (symbolic data analysis, SDA)中,更是熱門的研究主題。在本研究中,我們嘗試使 用另外一種維度縮減方法, 稱為機率主成分分析 PPCA (probabilistic PCA) ,來應用於區間資料。其目的是對高維度的區 間資料做維度縮減,使得我們可以在低維度空間中觀察此區間資料 的結構及特性。首先透過頂點法或中心法將區間資料轉化成傳統單 一數值資料,再透過 PPCA 做維度縮減,將維度縮減後的區間資料投 影到二維空間,以利觀察其結構。我們在模擬研究中,以四種不同分 配及不同資料遺失比例,利用 PCA 以及 PPCA 去估計傳統資料以及 區間資料的維度縮減方向。最後我們使用兩個實際資料,金融資料及 臉部資料,比較 PCA 及 PPCA 的表現。最後結果發現,在沒有遺失值 的模擬研究中以及在實際完整資料的分析下,兩者方法並無明顯差 異,但是在模擬研究中,四種不同分配隨著資料遺失比例越高,PPCA 在維度縮減方向的估計上都比 PCA 來的準確。

並列摘要


Principal component analysis (PCA) is a widely used dimension reduction method. It is also one of popular research topics in the field of Symbolic Data Analysis (SDA). In this study, we applied the probabilistic PCA (PPCA), an alternative dimension reduction method, to the interval-valued data. We aim to reduce the dimensionality of the interval-valued data in high-dimensional space so that the structures and characteristics of the interval-valued data can be investigated in the lower dimensional space.Firstly, the interval-valued data is converted into the form of the traditional data table using the vertices or center method. Then the classical PCA and PPCA can be applied directly. In this way, we could explore the structure of the projected intervals in the two-dimensional space. In the simulation studies, we generate data using four different distributions with various proportions of missing observations. We evaluate the performance of PCA and PPCA in estimating the true dimension reduction directions based on the simulated traditional data and the simulated interval-valued data. The results shows that there was no significant difference between PCA and PPCA for complete data sets. However, the performance of PPCA is better than those of PCA when the data contains the higher proportion of missing observations. Finally, we apply PCA and PPCA to two real interval-valued data sets, the finance data and the face data.

參考文獻


WILEY SERIES IN COMPUTATIONAL STATISTICS.
Billard, L. and Le-Rademacher, J. (2012). Symbolic covariance principal component analysis
Chen, Y. S. and Wu, H. M. (2013). The application of sliced inverse regression for dimension
Douzal-Chouakria, A., Billard, L. and Diday, E. (2011). Principal components for interval-valued
observations. Statistical Analysis and Data Mining, 4(2), 229 - 246.

延伸閱讀