Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
User preference: a measure of query-term quality
Wacholder N., Liu L. Journal of the American Society for Information Science and Technology57 (12):1566-1580,2006.Type:Article
Date Reviewed: Nov 29 2007

The value of a textbook index depends on its value to readers. This paper studies textbook index quality from the viewpoint of user acceptance.

The study takes as its test material a 350-page college-level textbook. In order to provide an experimental context, the authors created a set of 26 questions to pose to 26 undergraduate and graduate student experimental subjects. The objective was to determine which of three index techniques (using two automatically generated index sets and a modified treatment of the textbook’s index) the students preferred when they used the textbook, in an attempt to answer the 26 experimental questions.

The experimenters “compare[d] user preference for three sets of terms; one had been preconstructed by a human indexer, and two were identified automatically.” The two automatic methods created 7,980 terms (hierarchical search (HS) method) and 1,788 terms (TEC method), while the human created (presumably so, since the book’s index could have been generated by a combination of machine and human efforts) index had 673 terms in the index of the book. Not surprisingly, and despite a conscious attempt to “de-tune” the books index by eliminating bold-faced topical entries, the index was preferred by an eight to one ratio over the two automatic methods. This is not surprising, since the index terms, while perhaps originally generated by mechanical means, were presumably examined and organized for efficacy by the book’s authors and editors. Without human intervention, most machine generated word lists are less than satisfactory. For example, in a portable document format (PDF) version of this paper, an Adobe Search of “identified automatically” (a term from the abstract) found 18 references, but did not directly lead to a reference to how this automatic method was done. However, “automatically identified” found three references, one leading to the definition. This illustrates how a machine list creator is not sensitive to the semantics or organization of a work.

This paper brings back many memories. While at The University of Pennsylvania, I attacked a similar problem in my doctoral thesis, employing computing techniques in an attempt to extract some semantic relevance among search terms. Perhaps the authors’ next effort will delve into semantics, in an attempt to automatically create and organize an index set competitive with a human-generated index.

Reviewer:  J. S. Edwards Review #: CR134986
Bookmark and Share
  Featured Reviewer  
 
Query Formulation (H.3.3 ... )
 
 
Human Factors (H.1.2 ... )
 
 
Interactive Systems (I.5.5 ... )
 
 
Performance Evaluation (Efficiency And Effectiveness) (H.3.4 ... )
 
 
Implementation (I.5.5 )
 
 
Systems And Software (H.3.4 )
 
  more  
Would you recommend this review?
yes
no
Other reviews under "Query Formulation": Date
A comparison of two methods for Boolean query relevancy feedback
Salton G., Voorhees E., Fox E. Information Processing and Management: an International Journal 20(5-6): 637-651, 1984. Type: Article
Jul 1 1985
Calibrating databases
Fischhoff B., MacGregor D. Journal of the American Society for Information Science 37(4): 222-233, 1986. Type: Article
Sep 1 1987
Space-time trade-offs for orthogonal range queries
Vaidya P. SIAM Journal on Computing 18(4): 748-758, 1989. Type: Article
Oct 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy