Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Digital weight watching: reconstruction of scanned documents
Marx M., Gielissen T. International Journal on Document Analysis and Recognition14 (2):229-239,2011.Type:Article
Date Reviewed: Jul 25 2011

In this paper, Marx and Gielissen construct a search system for parliamentary proceedings that have been scanned and processed via optical character recognition (OCR). This type of document has a large file size (in the targeted dataset, the average size was 16.5 MB per document). Therefore, users have to spend a lot of time downloading and going through documents that are highly ranked by search engines before determining which documents are actually relevant. The authors solved this problem by generating a summary of documents (snippets) from reconstructed PDF documents (transformed from the OCRed data); they significantly reduced the file size (to 1.5 percent of the original), leading to a reduction in download time.

The authors focus on rule-based architecture to extract information from OCRed data, and evaluate it in terms of physical points (reduction in file size and processing time) and economic cost. However, in addition to the physical points, they should have evaluated their proposed approach from the user’s point of view. If the authors were to employ natural language processing or information retrieval techniques to generate informative snippets, they could improve user satisfaction.

Another notable point in this paper is that it contains several helpful pointers to information on making “governmental and/or political data easily accessible through the Internet.” Researchers or developers who work for information systems departments in government offices can use the helpful references to get up-to-date information about this area.

Reviewer:  Kazunari Sugiyama Review #: CR139275 (1201-0100)
Bookmark and Share
 
XML (I.7.2 ... )
 
 
Scanning (I.4.1 ... )
 
 
Information Search And Retrieval (H.3.3 )
 
Would you recommend this review?
yes
no
Other reviews under "XML": Date
Just XML
Simpson J., Prentice-Hall, Inc., Upper Saddle River, NJ, 1999. Type: Book (9780139434174)
Jan 1 1999
XML gets down to business
Weiss A. netWorker: The Craft of Network Computing 3(3): 36-43, 1999. Type: Article
Dec 1 1999
Essential XUL programming
Bullard V., Smith K., Daconta M., John Wiley & Sons, Ltd., Chichester, UK, 2001.  432, Type: Book (9780471415800)
Oct 9 2002
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy