帳號:guest(3.147.80.94)          離開系統
字體大小: 字級放大   字級縮小   預設字形  

詳目顯示

以作者查詢圖書館館藏以作者查詢臺灣博碩士論文系統以作者查詢全國書目
作者(中文):牧摩度
作者(外文):Momodou Lamim Sanyang
論文名稱(中文):Data Visualization by Self-Organizing Map
論文名稱(外文):以SOM做資料視覺化
指導教授(中文):陳朝欽
指導教授(外文):Chen, Chaur-Chin
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學號:9765681
出版年(民國):99
畢業學年度:98
語文別:英文
論文頁數:38
中文關鍵詞:SOM資料視覺化
外文關鍵詞:SOMData Visualization
相關次數:
  • 推薦推薦:0
  • 點閱點閱:117
  • 評分評分:*****
  • 下載下載:6
  • 收藏收藏:0
Data visualization is very paramount nowadays for the simple fact that we have
acquired huge and complex data and are increasingly accumulating more and more due
to cheap storage devices recently. Most of those data are high-dimensional and
therefore hard for human to visualize. Efforts were made to alleviate this
high-dimensional visualization problem and through researchers endeavor,
Self-Organizing Map (SOM) was born.
The Self-Organizing Map is an unsupervised neural network algorithm that
projects high-dimensional data onto a two-dimensional map which we human can
easily visualize. The projection preserves the topology of the data so that similar data
items will be mapped to nearby locations on the map. It is a powerful method for data
mining and cluster extraction and very useful for processing data of high
dimensionality and complexity.
There are several visualization methods which present different aspects of the
information learned by the SOM to gain insight and guide segmentation of the data. In
this thesis, common visualization methods such as dendrogram, 2d-dendrogram,
principal component projection, label of maps, U-matrix and some recently introduced
methods such as P-matrix and the U*-Matrix plots are used to visualize the results on
four data sets: IRIS which has 150 patterns with 3 classes, each class has 50 patterns,
each pattern has four features; 8OX has 45 patterns with 3 classes, each class has 15
patterns, each pattern has 8 features; A microarray data set ALL-AML Leukemia with
38 patients of 2 classes (27 ALL, 11 AML), each patient has 7129 genes; and Colon
Tumor with 62 samples (22 normal, 40 tumor) of 2 classes with a total of 2000 genes.
The visualization results of each of these data sets are reported using the
aforementioned methods, the 2d-dendrogram method seems to be a better tool for
visualizing the microarray data and all the methods perform well on the IRIS and 8OX.
Chapter 1 Introduction………………………………………………………………1
Chapter 2 Review of clustering Algorithms...………..……………………………..3
2.1 Distance Measures………………………………………………..…........4
2.1.1 The Minkowski distance........................................................................4
2.1.2 The Vector angle measurement...............................................................5
2.1.3 The Correlation measurement………………….....................................5
2.2 Hierarchical Clustering.....………………………………………………....5
2.2.1 Single-Linkage versus Complete-Linkage…………………………….7
2.2.2 Strength and Limitations ……………………………………………...9
2.3 Partitioning Clustering............……………………………………..............9
2.3.1 K-Means Clustering Algorithm................................................................10
2.4 Topology Preserving Mapping …………………………...........................10
2.4.1 Self-Organizing Map (SOM)………………………………………….11
Chapter 3 Self- Organizing Map ……………………...…………………………….12
3.1 Algorithm for Kohonen’s Self- Organizing Map...............………………..12
3.2 Batch Training Algorithm for SOM........................………………………13
3.3 Efficient initialization schemes for SOM..........………………………….14
Chapter 4 The Data Sets ……………………...……………………………………...16
4.1 Description of IRIS data...............……………………………………………16
4.2 Description of 8OX Data..........................……………………………………17
4.3 Description of ALL-AML_Leukemia [Go199]..........…………………………19
4.4 Description of Colon Tumor [Alo99]..........…………………………………...19
Chapter 5 Experimental Results.....................................…………………………….21
5.1 U-Matrix ……....................................................................................................21
5.2 P-Matrix.............................................................................................................22
5.3 U*-Matrix...........................................................................................................24
5.4 The Visualization Results....................................................................................26
Chapter 6 Conclusion and Future work................................................……………...35
References………………………………………………………………………............36
[Alon1999] U. Alon et al., “Broad Patterns of Gene Expression Revealed by Clustering
Analysis of Tumor and Normal colon Tissues Probed by Oligonucleotide Arrays”,
Proceedings of National Academy of Sciences of the United States of America, vol.
96, 6745-6750, 1999.
[Arab1994] P. Arabie and L. Hubert, "Advanced methods in marketing research", Oxford:
Blackwell, Cluster Analysis in Marketing Research, 1994.
[Bald2002] P. Baldi and G. Hatfield, “DNA microarrays and gene expression”, Cambridge
University Press, 2002.
[Bere2005] M. Berens, H. Liu, L. Parsons, L. Yu, and Z. Zhao. “Fostering biological
relevance in feature selection for microarray data”, IEEE Intelligent
Systems, vol. 20, no. 6, 29–32, 2005.
[Chen2005] C.C. Chen and H.T. Chu, "Similarity Measurement between Images", IEEE
Conference on Computer Software and Algorithm (Compsac 2005),
41~42, Edinburgh, UK, 2005.
[Gant2008] G. Eskelsen and F. John, “The diverse and exploding digital universe”, an
Updated Forecast of Worldwide Information Growth Through 2011, March
2008.
[Golu1999] T.R. Golub et al., “Molecular Classification of Cancer: Class Discovery and Class
Prediction by Gene Expression Monitoring”, Science, vol. 286,
531-537, 1999.
[Goog2009] Google Scholar. http://scholar.google.com, February, 2009.
[Hart1979] J.A Hartigan, M. A Wong “A K-means clustering algorithm”, J. of Royal Statistical
Society, Ser. C, 1979.
[Jain1988] A.K. Jain and R.C. Dukes, “Algorithms for Clustering Data”, Prentice Hall, New
Jersey, 1988.
[John1967] S.C. Johnson, "Hierarchical Clustering Schemes", Psychometrika, vol. 2
241-254, 1967.
[Juha1999] J. Vesanto, J.Himberg, E. Alhoniemi, and J. Parhankangas “Self-organizing map in
Matlab: the SOM Toolbox”, Laboratory of Computer and
Information Science, Helsinki University of Technology, Finland, 1999.
[Koho1990] T. Kohonen, “The Self-Organizing Map”, Proceedings of The IEEE, vol. 78, no.
9, 1464-1480, 1990.
[Meye2000] R.D. Meyer and D. Cook, "Visualization of data", Mathematical and Statistical
Sciences, Pfizer Central Research, Groton, Connecticut, USA, 2000.
[Mika1999] S. Mika, G. Ratsch, J.Weston, B. Scholkopf, and K.R. Muller, “Fisher Discriminant
Analysis with Kernels”, IEEE International Workshop on Neural Networks for
Signal Processing, vol. 9, 41-48, 1999.
[Theo2009] S. Theodoridis and K. Koutroumbas, “Pattern Recognition”, Academic Press, 4rd
edition, 2009.
[Tuke1977] J. Tukey and J.Wilder, "Exploratory data analysis", Addison-Wesley, 1977.
[Ults2003a] A. Ultsch, “U*-Matrix: a Tool to visualize Clusters in high dimensional
Data”, Data Bionics Research Lab, Department of Computer Science,
38
University of Marburg, Germany, 2003.
[Ults2003b] A. Ultsch, “Pareto Density Estimation: Density Estimation for Knowledge
Discovery”, Data Bionics Research Lab, Department of Computer Science,
University of Marburg, Germany, 2003.
[Ults2005] A. Ultsch, “Clustering with SOM: U*C”, Data Bionics Research Group,
Department of Computer Science, University of Marburg, Germany, 2005.
[Ults2007] A. Ultsch, “Emergence in Self Organizing Feature Maps”, Data Bionics
Research Group, Department of Computer Science, University of Marburg,
Germany, 2007.
[Wang2007] T.Y. Wang, “A Study on Analyzing Microarray Data Using SVM and SOM”,
Master Thesis, National Tsing Hua University, Taiwan, March, 2007.
[Web01] http://www.cs.nthu.edu.tw/~cchen/ , last access on May 31, 2010.
[Web02] http://www.ics.uci.edu/mlearn/MLRepository.html, last access on May 31, 2010.
[Web03] http://www-genome.wi.mit.edu/cgi-bin/cancer/, last access on May 31, 2010.
[Web04] http://microarray.princeton.edu/oncology/affydata/index.html, last access on May 31,
2010.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
* *