Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Effect of relationships between words on Japanese information retrieval
Matsumura A., Takasu A., Adachi J. ACM Transactions on Asian Language Information Processing5 (3):264-289,2006.Type:Article
Date Reviewed: May 18 2007

The most common approach to the implementation of information retrieval (IR) systems is the term frequency-inverse document frequency (tf-idf) model. In this model, in a given document, a word with high document frequency and low collection frequency would have a relatively higher importance in document ranking. The implementation of the model is realized using an inverted file structure, which provides an efficient retrieval environment, especially with the use of dynamic pruning techniques introduced for the Web environment [1].

This work introduces two IR methods, and compares their retrieval performance with that of the tf-idf model. The first method uses dependency relationships between words in a sentence. The second method uses proximity relationships, mainly ordered cooccurrence information of words in a sentence, to approximate the dependency relationships between words. For relationship representation, a structured index in the form of a binary tree is constructed for each document. The index creation involves morphological, dependency, and compound noun analyses. The same structure is also constructed for the queries, which are expressed in the form of complete sentences. The document and query index structures are together used in the search process. The authors show that in Japanese IR, using full sentence queries, these methods, with properly chosen parameters, are superior to the tf-idf model, and can increase IR effectiveness by up to 22 percent.

The presentation in the paper flows nicely. However, the authors’ claim regarding the superior performance of the methods “independently of the target collection and search topic set” is too strong, since the experiments are based on the subsets of the National Academic Center for Science Information Systems (NACSIS) Test Collection for IR systems (NTCIR-1). Additionally, there are a number of issues and concerns to be addressed. My major concern is the use of full sentence queries. For example, most Web queries consist of few words. Hence, the approach is hard to generalize to most real-life situations. Furthermore, a comparison with IR techniques that use query term distance information in a statistical sense is missing. Search efficiency is an important issue that is not addressed, but it is on the future research agenda of the authors.

The NTCIR -1 test collection being used in the experiments contains 330,000 documents, and 83 queries (topics). The authors indicate that it is the largest Japanese IR test collection. Its size, when compared with the sizes of some English TREC test collections, is relatively small. This observation on Japanese IR is an indication of the existence of room for more aggressive IR research in (probably most) non-English languages.

Reviewer:  F. Can Review #: CR134297
1) Zobel, J.; Moffat, A. Inverted files for text search engines. ACM Computing Surveys 38, 2(2006), Article–6.
Bookmark and Share
 
Indexing Methods (H.3.1 ... )
 
 
Linguistic Processing (H.3.1 ... )
 
 
Performance Evaluation (Efficiency And Effectiveness) (H.3.4 ... )
 
 
Search Process (H.3.3 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Information Search And Retrieval (H.3.3 )
 
  more  
Would you recommend this review?
yes
no
Other reviews under "Indexing Methods": Date
Computation of term/document discrimination values by use of the cover coefficient
Can F. (ed), Ozkarahan E. Journal of the American Society for Information Science 38(3): 171-183, 1987. Type: Article
Mar 1 1988
Automatic indexing of full texts
Jonák Z. Information Processing and Management: an International Journal 20(5-6): 619-627, 1984. Type: Article
Jul 1 1985
Evaluation of access methods to text documents in office systems
Rabitti F., Zizka J.  Research and development in information retrieval (, King’s College, Cambridge,401984. Type: Proceedings
Sep 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy