Comparing approaches for automatic annotation and semantic retrieval of large image collections is of great interest and a big challenge. Hare and Lewis make an important contribution to the field, by presenting in this paper an effective framework that can work with large datasets, as well as their experimental results. Existing image databases and the research projects that use them have some shortcomings. For example, the Corel Stock Photography collection has been criticized for being “too easy” and too small for proper retrieval evaluation. Also at issue is the fact that existing automatic annotation systems are highly sensitive to the image features selected, partially because they do not have a systematic way of selecting features with large image databases.
The authors propose a solid, repeatable methodology and protocol for evaluating automatic annotation, using the MIR Flickr dataset. They also present the software they created using this methodology and the results of their experiments. Multiple common image features--such as difference of Gaussian, maximally stable extremal region (MSER), scale invariant feature transform (SIFT), and color SIFT--are used in the system. Hare and Lewis use a semantic retrieval technique called a linear-algebraic semantic space, which is a high-dimension vector space that places an image closest to the words that describe it. In their experiments, the complete system shows very positive results.
The system described in the paper provides the research community with a very useful tool for image retrieval and performance evaluation.