Computing Reviews, the leading online review service for computing literature.

Search

Length normalization in XML retrieval
Kamps J., de Rijke M., Sigurbjörnsson B. Research and development in information retrieval (Proceedings of the 27th International Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, Jul 25-29, 2004)80-87.2004.Type:Proceedings

Date Reviewed: Nov 1 2005

The retrievable units in Extensible Markup Language (XML) documents are individual elements, and the length distribution of such elements is different from that of standard documents. Therefore, document length normalization in XML retrieval needs to take a different approach than that in standard document retrieval. In this paper, the authors investigate the issue of document length normalization in the XML retrieval context, by analyzing the length distributions of XML elements, and carrying out an experiment investigating length normalization techniques in XML retrieval. The element length analysis indicates that, although the distribution of arbitrary elements is skewed toward short elements, the distribution of relevant elements is fairly even, except in the case of the shortest elements. In addition, the length distribution of prior probability of relevant elements is heavily skewed toward long elements. The experiment evaluates the effects of smoothing, length priors, and index cut-offs on retrieval performance. The results indicate that length priors improve retrieval performance significantly. While removing shorter elements from the index does improve performance, this improvement is far less than that obtained by the use of length priors. The results also indicate that the smoothing parameter is dependent on the length prior. The primary contribution of this paper is the reconsideration of the concept of document length normalization in a new context, that of XML retrieval. This paper also provides possible techniques that could be used for XML element length normalization.

Reviewer: Xiaoya Tang	Review #: CR131969 (0609-0958)

Information Storage And Retrieval (H.3 )

XML (I.7.2 ... )

Content Analysis And Indexing (H.3.1 )

Digital Libraries (H.3.7 )

Information Search And Retrieval (H.3.3 )

Systems And Software (H.3.4 )

Would you recommend this review?

yes

Other reviews under "Information Storage And Retrieval":	Date

Building an example application with the unstructured information management architecture Ferrucci D., Lally A. IBM Systems Journal 43(3): 455-475, 2004. Type: Article	Feb 2 2005

Rich results from poor resources: NTCIR-4 monolingual and cross-lingual retrieval of Korean texts using Chinese and English Kwok K., Choi S., Dinstl N. ACM Transactions on Asian Language Information Processing 4(2): 136-162, 2005. Type: Article	Mar 2 2006

Mining search engine query logs for query recommendation Zhang Z., Nasraoui O. World Wide Web (Proceedings of the 15th International Conference on the World Wide Web, Edinburgh, Scotland, May 23-26, 2006)1039-1040, 2006. Type: Proceedings	Jul 25 2006

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy