Computing Reviews, the leading online review service for computing literature.

Search

User preference: a measure of query-term quality
Wacholder N., Liu L. Journal of the American Society for Information Science and Technology57 (12):1566-1580,2006.Type:Article

Date Reviewed: Nov 29 2007

The value of a textbook index depends on its value to readers. This paper studies textbook index quality from the viewpoint of user acceptance. The study takes as its test material a 350-page college-level textbook. In order to provide an experimental context, the authors created a set of 26 questions to pose to 26 undergraduate and graduate student experimental subjects. The objective was to determine which of three index techniques (using two automatically generated index sets and a modified treatment of the textbook’s index) the students preferred when they used the textbook, in an attempt to answer the 26 experimental questions. The experimenters “compare[d] user preference for three sets of terms; one had been preconstructed by a human indexer, and two were identified automatically.” The two automatic methods created 7,980 terms (hierarchical search (HS) method) and 1,788 terms (TEC method), while the human created (presumably so, since the book’s index could have been generated by a combination of machine and human efforts) index had 673 terms in the index of the book. Not surprisingly, and despite a conscious attempt to “de-tune” the books index by eliminating bold-faced topical entries, the index was preferred by an eight to one ratio over the two automatic methods. This is not surprising, since the index terms, while perhaps originally generated by mechanical means, were presumably examined and organized for efficacy by the book’s authors and editors. Without human intervention, most machine generated word lists are less than satisfactory. For example, in a portable document format (PDF) version of this paper, an Adobe Search of “identified automatically” (a term from the abstract) found 18 references, but did not directly lead to a reference to how this automatic method was done. However, “automatically identified” found three references, one leading to the definition. This illustrates how a machine list creator is not sensitive to the semantics or organization of a work. This paper brings back many memories. While at The University of Pennsylvania, I attacked a similar problem in my doctoral thesis, employing computing techniques in an attempt to extract some semantic relevance among search terms. Perhaps the authors’ next effort will delve into semantics, in an attempt to automatically create and organize an index set competitive with a human-generated index.

Reviewer: J. S. Edwards	Review #: CR134986

Query Formulation (H.3.3 ... )

Human Factors (H.1.2 ... )

Interactive Systems (I.5.5 ... )

Performance Evaluation (Efficiency And Effectiveness) (H.3.4 ... )

Implementation (I.5.5 )

Systems And Software (H.3.4 )

Would you recommend this review?

yes

Other reviews under "Query Formulation":	Date

A comparison of two methods for Boolean query relevancy feedback Salton G., Voorhees E., Fox E. Information Processing and Management: an International Journal 20(5-6): 637-651, 1984. Type: Article	Jul 1 1985

Calibrating databases Fischhoff B., MacGregor D. Journal of the American Society for Information Science 37(4): 222-233, 1986. Type: Article	Sep 1 1987

Space-time trade-offs for orthogonal range queries Vaidya P. SIAM Journal on Computing 18(4): 748-758, 1989. Type: Article	Oct 1 1990

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy