Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Length normalization in XML retrieval
Kamps J., de Rijke M., Sigurbjörnsson B.  Research and development in information retrieval (Proceedings of the 27th International Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, Jul 25-29, 2004)80-87.2004.Type:Proceedings
Date Reviewed: Nov 1 2005

The retrievable units in Extensible Markup Language (XML) documents are individual elements, and the length distribution of such elements is different from that of standard documents. Therefore, document length normalization in XML retrieval needs to take a different approach than that in standard document retrieval. In this paper, the authors investigate the issue of document length normalization in the XML retrieval context, by analyzing the length distributions of XML elements, and carrying out an experiment investigating length normalization techniques in XML retrieval.

The element length analysis indicates that, although the distribution of arbitrary elements is skewed toward short elements, the distribution of relevant elements is fairly even, except in the case of the shortest elements. In addition, the length distribution of prior probability of relevant elements is heavily skewed toward long elements.

The experiment evaluates the effects of smoothing, length priors, and index cut-offs on retrieval performance. The results indicate that length priors improve retrieval performance significantly. While removing shorter elements from the index does improve performance, this improvement is far less than that obtained by the use of length priors. The results also indicate that the smoothing parameter is dependent on the length prior.

The primary contribution of this paper is the reconsideration of the concept of document length normalization in a new context, that of XML retrieval. This paper also provides possible techniques that could be used for XML element length normalization.

Reviewer:  Xiaoya Tang Review #: CR131969 (0609-0958)
Bookmark and Share
 
Information Storage And Retrieval (H.3 )
 
 
XML (I.7.2 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Digital Libraries (H.3.7 )
 
 
Information Search And Retrieval (H.3.3 )
 
 
Systems And Software (H.3.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Information Storage And Retrieval": Date
Building an example application with the unstructured information management architecture
Ferrucci D., Lally A. IBM Systems Journal 43(3): 455-475, 2004. Type: Article
Feb 2 2005
Rich results from poor resources: NTCIR-4 monolingual and cross-lingual retrieval of Korean texts using Chinese and English
Kwok K., Choi S., Dinstl N. ACM Transactions on Asian Language Information Processing 4(2): 136-162, 2005. Type: Article
Mar 2 2006
Mining search engine query logs for query recommendation
Zhang Z., Nasraoui O.  World Wide Web (Proceedings of the 15th International Conference on the World Wide Web, Edinburgh, Scotland, May 23-26, 2006)1039-1040, 2006. Type: Proceedings
Jul 25 2006
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy