Given a set, X = {x1, x2, ... , xN} of N objects called patterns, an integer M, a distance function d on patterns, a learning rate &ggr;, and a terminating condition T, a self-organizing map Y = som(X, M, d, &ggr;, T) is a set {y1, y2, ... , yM} of M prototype patterns (or neurons or cluster centers) that are close under d to the elements of X [1]. A nondeterministic algorithm, that I will call calc-som, initializes Y pseudorandomly, then, for pseudorandomly-chosen elements x ∈ X, computes the element y* ∈ Y that is closest under d to x; and recomputes all elements of Y in some neighborhood N(y*) of y*, so as to move them closer to x according to a rule dependent on &ggr;, until T holds. Thus, calc-som hopes to compute a set Y whose elements cluster around those of X, and which therefore, in some sense, represents X.
Usually, the patterns in X have been implemented as vectors whose elements are attributes of various features, so that distance d is defined in terms of some multi-dimensional feature space. The main contribution of this paper is to extend calc-som so as to execute on patterns that are graphs. In this context, d is interpreted as a form of graph edit distance [2], and the recomputation of the graph elements of Y is implemented accordingly. Other computational strategies are employed based on convenience or efficiency considerations: approach to initialization of Y, elimination of unusable elements of Y, selection and iterative update of &ggr;, selection of T, restriction of N(y*) to y*, and so on.
Using graphs derived from letters of the alphabet, experiments are described to show that calc-som can indeed compute a som Y whose elements are plausible clusters for X. This is perhaps a rather surprising result, in view of the many arbitrary ingredients in the algorithm.
This paper is generally well written, and the results do appear to be of interest. As a non-expert in the field, I would have been grateful for a somewhat more careful mathematical development of the underlying theory.