Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Efficient online index construction for text databases
Lester N., Moffat A., Zobel J. ACM Transactions on Database Systems33 (3):1-33,2008.Type:Article
Date Reviewed: Feb 2 2009

In many information retrieval applications, the update of inverted index structures needs to be online, since such indexes should always be current and accessible for query processing. This study introduces an online index construction technique for document-sorted inverted indexes.

First, the paper introduces the inverted index concept. This is followed by a detailed explanation of three online index construction techniques: rebuild, in-place update, and remerge. These methods have been explored by Lester, Moffat, and Zobel in their previous work [1]. In the remerge approach, the entire on-disk index is read, extended by using the added documents, and written to a new disk location. This approach requires processing of the entire index when the in-memory and on-disk parts are merged, and its data transfer cost is “proportional to the total number of postings in the resulting index.” All of these index construction techniques use “a single, contiguous, inverted list per term.” The authors describe a new method that uses the remerge technique, but, this time, “break the index into a tightly controlled number of partitions” by using a technique called geometric partitioning. According to the authors, the partitioning idea is inspired by the “too-many-fragments approach of Tomasic et al.” [2]. By controlling the number of partitions, the merge activities are handled in such a way that the processing cost is kept at a reasonable level. The method uses periodic merging to prevent the excessive growth of number of partitions that may increase query-processing time. The geometric partitioning idea is also used for vocabulary updates by storing vocabulary in multiple B+ tree structures.

The authors analyze their methods and validate them by using the 426 GB gov2 Web-based document collection and a sequence of 10,000 queries provided by Microsoft Search. In the query-processing experiments, they use the index at 49 different points during its construction. They show that their approach scales well, and construction costs are significantly reduced, compared to standard contiguous representation of inverted indexes. Furthermore, controlling and limiting the number of index segments reduces query-processing time increases caused by partitioning.

The paper is well written and requires careful reading. The work is valuable for researchers and students in information-retrieval courses.

Reviewer:  F. Can Review #: CR136478 (0910-0962)
1) Lester, N.; Moffat, A.; Zobel, J. Fast Online Index Construction Via Geometric Partitioning. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM’05) ACM, 2005, 776–783.
2) Tomasic, A.; García-Molina, H.; Shoens, K. Incremental Updates of Inverted Lists for Text Document Retrieval. In Proceedings of the ACM International Conference on Management of Data (SIGMOD’94) ACM, 1994, 289–300.
Bookmark and Share
 
Indexing Methods (H.3.1 ... )
 
 
File Organization (H.3.2 ... )
 
 
Performance Evaluation (Efficiency And Effectiveness) (H.3.4 ... )
 
 
Information Storage (H.3.2 )
 
 
Systems And Software (H.3.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Indexing Methods": Date
Computation of term/document discrimination values by use of the cover coefficient
Can F. (ed), Ozkarahan E. Journal of the American Society for Information Science 38(3): 171-183, 1987. Type: Article
Mar 1 1988
Automatic indexing of full texts
Jonák Z. Information Processing and Management: an International Journal 20(5-6): 619-627, 1984. Type: Article
Jul 1 1985
Evaluation of access methods to text documents in office systems
Rabitti F., Zizka J.  Research and development in information retrieval (, King’s College, Cambridge,401984. Type: Proceedings
Sep 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy