Data Visualization by Self-Organizing Map

Data visualization is very paramount nowadays for the simple fact that we have acquired huge and complex data and are increasingly accumulating more and more due to cheap storage devices recently. Most of those data are high-dimensional and therefore hard for human to visualize. Efforts were made to alleviate this high-dimensional visualization problem and through researchers endeavor, Self-Organizing Map (SOM) was born. The Self-Organizing Map is an unsupervised neural network algorithm that projects high-dimensional data onto a two-dimensional map which we human can easily visualize. The projection preserves the topology of the data so that similar data items will be mapped to nearby locations on the map. It is a powerful method for data mining and cluster extraction and very useful for processing data of high dimensionality and complexity. There are several visualization methods which present different aspects of the information learned by the SOM to gain insight and guide segmentation of the data. In this thesis, common visualization methods such as dendrogram, 2d-dendrogram, principal component projection, label of maps, U-matrix and some recently introduced methods such as P-matrix and the U*-Matrix plots are used to visualize the results on four data sets: IRIS which has 150 patterns with 3 classes, each class has 50 patterns, each pattern has four features; 8OX has 45 patterns with 3 classes, each class has 15 patterns, each pattern has 8 features; A microarray data set ALL-AML Leukemia with 38 patients of 2 classes (27 ALL, 11 AML), each patient has 7129 genes; and Colon Tumor with 62 samples (22 normal, 40 tumor) of 2 classes with a total of 2000 genes. The visualization results of each of these data sets are reported using the aforementioned methods, the 2d-dendrogram method seems to be a better tool for visualizing the microarray data and all the methods perform well on the IRIS and 8OX.

關鍵字

SOM ；資料視覺化

並列摘要

無資料

並列關鍵字

SOM ； Data Visualization

參考文獻

Proceedings of National Academy of Sciences of the United States of America, vol.

96, 6745-6750, 1999.

Blackwell, Cluster Analysis in Marketing Research, 1994.

[Bald2002] P. Baldi and G. Hatfield, “DNA microarrays and gene expression”, Cambridge

relevance in feature selection for microarray data”, IEEE Intelligent

國際替代計量

Data Visualization by Self-Organizing Map

主題瀏覽