Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Text retrieval from early printed books
Marinai S. International Journal on Document Analysis and Recognition14 (2):117-129,2011.Type:Article
Date Reviewed: Sep 29 2011

Text retrieval techniques in printed or handwritten documents can be divided into two categories: approaches that are based on searching the text of the transcription of indexed words, and approaches that work directly on text images. Given the lack of a suitable text recognition engine and the ubiquity of ligatures and abbreviations in early printed books, the author adopts the recognition-free way.

There are two main contributions of this paper. First, the author applies a self-organizing map (SOM) to perform unsupervised character-object clustering. Second, to partially solve the problem caused by ligatures, the author incorporates the word width in the computation of dynamic time warping (DTW). By combining these two techniques, the author shows steady performance gains over baseline systems where one or two of these techniques are turned off.

However, so far, the acquired performance is still not satisfactory for real-world applications. But, again, this is a difficult problem that certainly calls for further investigations. It would be interesting to see some discussions of the failed or confusing cases, as these would help readers better understand the challenges involved.

Reviewer:  Jin Chen Review #: CR139475 (1203-0318)
Bookmark and Share
 
Document And Text Processing (I.7 )
 
 
General (H.3.0 )
 
 
General (I.2.0 )
 
Would you recommend this review?
yes
no
Other reviews under "Document And Text Processing": Date
Document clustering method using dimension reduction and support vector clustering to overcome sparseness
Jun S., Park S., Jang D. Expert Systems with Applications: An International Journal 41(7): 3204-3212, 2014. Type: Article
Sep 19 2014
Handbook of document image processing and recognition
Doermann D., Tombre K., Springer Publishing Company, Incorporated, New York, NY, 2014.  1055, Type: Book (978-0-857298-58-4)
Oct 15 2014
Path-based methods on categorical structures for conceptual representation of Wikipedia articles
Kucharczyk Ł., Szymański J. Journal of Intelligent Information Systems 48(2): 309-327, 2017. Type: Article
Nov 3 2017

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy