Title

主成份分析及其應用

Translated Titles

Principal Component Analysis and Its Applications

Authors

徐婉綾

Key Words

主成份分析 ; Principal Component Analysis ; MATLAB

PublicationName

清華大學資訊系統與應用研究所學位論文

Volume or Term/Year and Month of Publication

2014年

Academic Degree Category

碩士

Advisor

陳朝欽

Content Language

英文

Chinese Abstract

人們的日常生活中會產生大量的資料,很多研究學者希望透過分析這些收集來的資料,來改善現況或是代替人力,像是預測經濟情勢、辨認疾病等等。隨著運算速度的快速、儲存空間大幅度地增加,如何去做好資料分析成為一個重要的議題。而那些資料往往是複雜且多維度的,這也增加了分析資料的困難性。   作為一個資料分析技術,主成份分析能夠在保有最多特徵值的情況下,有效地降低資料的維度。在這篇論文中,我們利用主成份來重新呈現8OX、大腸癌基因、乳癌基因、紅酒辨認這四組資料。為了以視覺化呈現,我們利用MATLAB來表現二維及三維的實驗結果。最後,我們討論了主成份分析的一個使用注意事項,以及其可行的解決方法。

English Abstract

People produce huge amount of data in daily life. By collecting and analyzing those data, many researchers want to improve human life or to replace human labor, such as predict economic circumstances and identify diseases. With the rapidity of computing speed and the substantial increase of storage space, it is an important issue to develop data analysis excellently. Those data can be complex and in multi-dimensions, so it increases the difficulty for analyzing data.   As a data analysis technique, principal component analysis can retain most information out of data and, at the same time, reduce dimension effectively. In this thesis, we have used principal components to represent four data sets: 8OX data, colon cancer data, breast cancer data, and wine data. For presenting visual examples, the results of our experiments are shown in two and three-dimension plots by using MATLAB tools. At the end, we will discuss a notice on usage of PCA and its possible solution to obtain accurate results.

Topic Category 基礎與應用科學 > 資訊科學
電機資訊學院 > 資訊系統與應用研究所
Reference
  1. [Adle2001] N. Adler and B. Golany, “Evaluation of Deregulated Airline Networks Using Data Envelopment Analysis Combined with Principal Component Analysis with An Application to Western Europe,” European Journal of Operational Research, vol. 132, no. 2, 260-273, 2001.
    連結:
  2. [Aebe1994] S. Aeberhard, D. Coomans, and O. de Vel, “Comparative Analysis of Statistical Pattern Recognition Methods in High Dimensional Settings,” Pattern Recognition, vol. 27, no. 8, 1065-1077, 1994.
    連結:
  3. [Alon1999] U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” Proceedings of the National Academy of Sciences, vol. 96, no. 12, 6745-6750, 1999.
    連結:
  4. [Bald1989] P. Baldi, K. Hornik, “'Neural Networks and Principal Component Analysis: Learning from Examples without Local Minima,” Neural networks, vol. 2, no. 1, 53-58, 1989.
    連結:
  5. [Bitt2009] H.R. Bittencourt, B.P.O. Pasini, D.A. de O. Moraes, B.D. dos Santos, and V. Haertel, “Comparative Analysis of Two Classes Implementing Nominal Logistic Regression,” Revista Brasileira de Biometria, vol. 27, no. 1, 115-124, 2009.
    連結:
  6. [Haes1990] J.C. de Haes, F.C. van Knippenberg, and J.P. Neijt, “Measuring Psychological and Physical Distress in Cancer Patients: Structure and Application of the Rotterdam Symptom Checklist,” British Journal of Cancer, vol. 62, no. 6, 1034-1038, 1990.
    連結:
  7. [Hote1933] H. Hotelling, “Analysis of a Complex of Statistical Variables into Principal Components,” Journal of educational psychology, vol. 24, no. 6, 417-441, 1933.
    連結:
  8. [Joll1986] I.T. Jolliffe, “Principal Component Analysis,” Springer, 1st edition, 1986.
    連結:
  9. [Joll2002] I.T. Jolliffe, “Principal Component Analysis,” Springer, 2nd edition, 2002.
    連結:
  10. [Kram1991] M.A. Kramer, “Nonlinear Principal Component Analysis Using Autoassociative Neural Networks,” AIChE Journal, vol. 37, no. 2, 233-243, 1991.
    連結:
  11. [Krey2006] E. Kreyszig, “Advanced Engineering Mathematics,” John Wiley & Sons, 9th edition, 2006.
    連結:
  12. [Pear1901] K. Pearson, “On Lines and Planes of Closest Fit to Systems of Points in Space,” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, vol. 2, no. 11, 559-572, 1901.
    連結:
  13. [Rask1988] R. Raskin and H. Terry, “A Principal-Components Analysis of the Narcissistic Personality Inventory and Further Evidence of Its Construct Validity,” Journal of Personality and Social Psychology, vol. 54, no. 5, 890-902, 1988.
    連結:
  14. [DLBA2013] J.D. de la Bastida Castillo, “Software for Gene Expression Data Analysis,” MS Thesis, Institute of ISA, National Tsing Hua University, Hsinchu, Taiwan, May, 2013.
  15. [Jain1988] A.K. Jain and R.C. Dubes, “Algorithms for ClusteringData,” Englewood Cliffs, NJ: Prentice-Hall, 1988.
  16. [Nove2008] J. Novembre, T. Johnson, K. Bryc, Z. Kutalik, A.R. Boyko, A. Auton, A. Indap, K.S. King, S. Bergmann, M. R. Nelson, M. Stephens, and C.D. Bustamante, “Genes Mirror Geography within Europe,” Nature, vol. 456, no. 7218, 98-101, 2008.
  17. [Shyu2003] M. Shyu, S. Chen, K. Sarinnapakorn, and L. Chang, “A Novel Anomaly Detection Scheme Based on Principal Component Classifier,” Proceedings of the IEEE Foundations and New Directions of Data Mining Workshop, in conjunction with the Third IEEE International Conference on Data Mining, 172-179, 2003.
  18. [Veer2002] L.J. van't Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.M. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, and S.H. Friend, “Gene Expression Profiling Predicts Clinical Outcome of Breast Cancer,” Nature, vol. 415, no. 6871, 530-536, 2002.
  19. [Web01] http://levis.tongji.edu.cn/gzli/data/mirror-kentridge.html, last access on June 23, 2014.
  20. [Web02] http://archive.ics.uci.edu/ml/datasets/Wine, UCI Machine Learning Repository, last access on June 23, 2014.
  21. [Zuur2007] A.F. Zuur, E.N. Ieno, and G.M. Smith, “Analyzing Ecological Data,” Springer, 2007.