Computing Reviews

Text retrieval from early printed books
Marinai S. International Journal on Document Analysis and Recognition14(2):117-129,2011.Type:Article
Date Reviewed: 09/29/11

Text retrieval techniques in printed or handwritten documents can be divided into two categories: approaches that are based on searching the text of the transcription of indexed words, and approaches that work directly on text images. Given the lack of a suitable text recognition engine and the ubiquity of ligatures and abbreviations in early printed books, the author adopts the recognition-free way.

There are two main contributions of this paper. First, the author applies a self-organizing map (SOM) to perform unsupervised character-object clustering. Second, to partially solve the problem caused by ligatures, the author incorporates the word width in the computation of dynamic time warping (DTW). By combining these two techniques, the author shows steady performance gains over baseline systems where one or two of these techniques are turned off.

However, so far, the acquired performance is still not satisfactory for real-world applications. But, again, this is a difficult problem that certainly calls for further investigations. It would be interesting to see some discussions of the failed or confusing cases, as these would help readers better understand the challenges involved.

Reviewer:  Jin Chen Review #: CR139475 (1203-0318)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy