主成份分析 ; Principal Component Analysis ; MATLAB



人們的日常生活中會產生大量的資料,很多研究學者希望透過分析這些收集來的資料,來改善現況或是代替人力,像是預測經濟情勢、辨認疾病等等。隨著運算速度的快速、儲存空間大幅度地增加,如何去做好資料分析成為一個重要的議題。而那些資料往往是複雜且多維度的,這也增加了分析資料的困難性。   作為一個資料分析技術,主成份分析能夠在保有最多特徵值的情況下,有效地降低資料的維度。在這篇論文中,我們利用主成份來重新呈現8OX、大腸癌基因、乳癌基因、紅酒辨認這四組資料。為了以視覺化呈現,我們利用MATLAB來表現二維及三維的實驗結果。最後,我們討論了主成份分析的一個使用注意事項,以及其可行的解決方法。

People produce huge amount of data in daily life. By collecting and analyzing those data, many researchers want to improve human life or to replace human labor, such as predict economic circumstances and identify diseases. With the rapidity of computing speed and the substantial increase of storage space, it is an important issue to develop data analysis excellently. Those data can be complex and in multi-dimensions, so it increases the difficulty for analyzing data.   As a data analysis technique, principal component analysis can retain most information out of data and, at the same time, reduce dimension effectively. In this thesis, we have used principal components to represent four data sets: 8OX data, colon cancer data, breast cancer data, and wine data. For presenting visual examples, the results of our experiments are shown in two and three-dimension plots by using MATLAB tools. At the end, we will discuss a notice on usage of PCA and its possible solution to obtain accurate results.

