This chapter reports on a study that used self-organizing maps to detect clusters and the dominating variables in these clusters in the high-dimensional data set being analyzed. This approach can be used in preliminary data mining, when the hypothesis is being formulated and there is no prior knowledge about the data.
Analyzing high-dimensional data sets is a nontrivial task. A wide variety of methods, for example glyphs, scatter plots, parallel coordinate plots, and VERI [1], have been suggested. This chapter uses a graphical map display in which the proximity of the location represents the similarity of the data points. This is done using a self-organizing map (SOM) algorithm. The graphical display is a grid of map units. Each unit has an associated model vector, defining the points of the data set associated with the unit under consideration, and with a dimensionality that is the same as that of the data set. The model vector is computed in an iterative process that is either batch or stochastic.
Kaski, Nikkilä, and Kohonen developed a SOM system for textual databases, called WEBSOM, and it was able to detect a new category in the patent data. A new method is suggested for detecting and visualizing gaps between clusters. By means of a contribution profile, the contribution of the set of variables in a cluster can be evaluated. The areas where the contribution of all the variables is similar can be grouped together while characterizing the cluster.
The chapter is well written, with appropriate examples and figures. The only drawback is that the introduction does not mention what the later sections cover.