Guru, Kiranagi, and Nagabhushan introduce a nonsymmetric similarity measure for objects (patterns) that are defined by interval type data features. In the computations, each feature is taken as a separate similarity dimension. The overall similarity between two objects (in the paper it is referred to as “mutual similarity value”) is obtained by adding their similarities to each other (two vectors) using the triangular law of addition. The object-by-object similarity matrix is used for the implementation of a single-link type agglomerative clustering algorithm.
The authors assess their approach using three different data sets, and show that it is effective. They also compare their results with that of five other methods that use various similarity measures. Based on these experimental observations, the authors state that their approach is superior, due to reasons such as computational efficiency and no need for a user provided input parameter. However, note that, with small data sets--for example eight objects and five data futures of the “fats and oils data” used in the experiments--and with the computational power of today’s computers, the efficiency cannot be a concern. Scalability of the effectiveness in large data sets needs further investigation.
Another nonsymmetric similarity measure, the cover coefficient concept [1], is introduced for clustering textual databases. It employs the document word frequencies as the data features. Although the data feature types are different, both similarity measures are comparable (but not identical): with some imagination, computations of the measures can conceptually be mapped to each other.