Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Efficient update of indexes for dynamically changing Web documents
Lim L., Wang M., Padmanabhan S., Vitter J., Agarwal R. World Wide Web10 (1):37-69,2007.Type:Article
Date Reviewed: Aug 13 2007

The World Wide Web has transformed the ways in which we do information retrieval. From its initial focus, on relatively stable and well-structured textual collections, the discipline has grown to encompass much broader issues, and deal with much more diverse and dynamic data repositories. In addition to this broadening of perspective, the Web has forced a reexamination of some fundamental issues. One such issue is the increased demand on index maintenance in order to enable search engines to keep track of a collection of Web documents in constant flux. This paper tackles this issue, from the perspective of incremental updates to inverted indices, in a very comprehensive fashion.

The paper presents an experimental analysis of the nature of changes in Web documents, proposes a novel index update method, and shows the advantages of the proposed method through analytical as well as empirical evaluation. The method is simple, consisting basically of interposing a layer of indexed partitions (landmarks) between the documents and the inverted index, and performing localized updates based on the edit transcripts (diff) for old and new versions of modified documents. This landmark-diff approach is motivated by the findings of the analysis of document updates, which show that most indexed documents do not change between updates, and that the changes that do occur tend to be small and localized (namely, clustered around specific areas of the documents). Evaluation shows that the landmark-diff method results in significant performance improvements compared to complete index rebuild and forward index update.

Those with an interest in implementation issues in information retrieval and the management of large and dynamic collections of documents will find this paper well worth reading.

Reviewer:  Saturnino Luz Review #: CR134638 (0807-0703)
Bookmark and Share
  Featured Reviewer  
 
Indexing Methods (H.3.1 ... )
 
 
Document Management (I.7.1 ... )
 
 
Search Process (H.3.3 ... )
 
 
World Wide Web (WWW) (H.3.4 ... )
 
 
Document and Text Editing (I.7.1 )
 
 
Information Search And Retrieval (H.3.3 )
 
  more  
Would you recommend this review?
yes
no
Other reviews under "Indexing Methods": Date
Computation of term/document discrimination values by use of the cover coefficient
Can F. (ed), Ozkarahan E. Journal of the American Society for Information Science 38(3): 171-183, 1987. Type: Article
Mar 1 1988
Automatic indexing of full texts
Jonák Z. Information Processing and Management: an International Journal 20(5-6): 619-627, 1984. Type: Article
Jul 1 1985
Evaluation of access methods to text documents in office systems
Rabitti F., Zizka J.  Research and development in information retrieval (, King’s College, Cambridge,401984. Type: Proceedings
Sep 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy