Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A document retrieval model based on term frequency ranks
Aalbersberg I.  Research and development in information retrieval (, Dublin, Ireland,1721994.Type:Proceedings
Date Reviewed: Oct 1 1995

Aalbersberg introduces a new full-text document retrieval model that is based on comparing occurrence frequency rank number of terms in queries and documents. The most generally accepted models are the vector-space model and the probabilistic retrieval model. The author claims that the new model is very similar to the vector-space model in that they both are based on the term occurrence frequencies or the term weights. The theory behind both models is derived from the so-called rank-frequency law of Zipf, which states that for a given text or sets of texts, the product of the term occurrence frequency and the rank numbers derived from the frequencies is approximately constant.

This paper is written for an audience interested in the full text search model and willing to wade through some complicated mathematics derivation based on set theory and rank order numbers. For the readers who are not well versed in the text retrieval field, the paper gives a fairly complete explanation of the vector-space retrieval model in section2 and a discussion of the rank-frequency law of Zipf in section3, as it is applied to the retrieval model. The main part of the paper is section 4, in which the author uses the first three parts to explain and derive mathematically the theory behind the rank-based retrieval model starting with the law of Zipf. I had some difficulties in following the derivations, but I am not claiming to be an expert in this area.

The final test of the new model is to compare its efficiency and effectiveness in retrieving full-text documents with that of the space-vector model. Using test collections from Communications of the ACM, MEDLARS, and other sources, the author in section 4.4 demonstrates that the new model is as effective as the space-vector model but more efficient with the use of term occurrence frequency rank orders. Because of the complexity of full-text search, the author concludes by calling for additional research into other full-text search models based on the use of term occurrence frequency orders, because of its more efficient way of expressing the weights of terms.

The 17 references provide a fairly good cross-section of works in this area. The paper appears to be the right length for presenting research work.

Reviewer:  E. Y. Lee Review #: CR118921 (9510-0808)
Bookmark and Share
 
Retrieval Models (H.3.3 ... )
 
 
Query Formulation (H.3.3 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Retrieval Models": Date
Evaluation of an inference network-based retrieval model
Turtle H., Croft W. (ed) ACM Transactions on Information Systems 9(3): 187-222, 1991. Type: Article
May 1 1993
On a model of distributed information retrieval systems based on thesauri
Mazur Z. Information Processing and Management: an International Journal 20(4): 499-505, 1984. Type: Article
Sep 1 1985
Information processing in linear vector space
Kunz M. Information Processing and Management: an International Journal 20(4): 519-525, 1984. Type: Article
Mar 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy