Computing Reviews

A viewable indexing structure for the interactive exploration of dynamic and large image collections
Rayar F., Barrat S., Bouali F., Venturini G. ACM Transactions on Knowledge Discovery from Data12(1):1-26,2018.Type:Article
Date Reviewed: 05/31/18

The authors are interested in developing a method for building and indexing a large collection of images and providing a means for studying these images. Their interest is in assembling a collection of images that are arranged according to their mutual similarity. In other words, images that are similar to each other are placed together, while images that are dissimilar are placed farther apart.

The balanced iterative reducing and clustering using hierarchies (BIRCH) data partitioning algorithm is used in the paper to develop a hierarchical structure of an image collection. BIRCH software is located in the packages python2-scikit-learn (for Red Hat, CentOS, and Fedora distributions) and python-scikit-learn (for Debian-based distributions including Ubuntu and Raspbian).

The authors present three algorithms: cluster feature (CF)-tree insertion, CF entry representative update, and classical relative neighborhood graph (RNG) construction. These algorithms are used to modify the BIRCH algorithm.

Taking advantage of the BIRCH algorithm, an underlying clustering of the image collection is made, where the leaves of a tree can be considered as clusters. This is demonstrated in figure 9 (p. 2:18). The first cluster presents 113 images, which includes all the images from the dinosaur class of a sample dataset. With the authors’ method, users can quickly identify images that were wrongly included in a cluster.

The authors’ web platform was implemented to allow nonexperts to navigate an image collection without extensive learning. This approach can be used to navigate an image collection without any specific objective, in order to gain insights into the overall image collections topology. The procedure developed by the authors can process collections that contain millions of images. The authors estimate that they can process an average of 1.5 million images per day.

Overall, this is an interesting paper. The authors programmed the BIRCH algorithm into C with the three modifications presented. The audience for this paper can be anyone who is faced with curating a collection that might come from computed tomography (CT) and magnetic resonance imaging (MRI) scans in healthcare, images from a social media environment, or any other sources of photographs.

Reviewer:  W. E. Mihalo Review #: CR146056 (1808-0442)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy