The value of a textbook index depends on its value to readers. This paper studies textbook index quality from the viewpoint of user acceptance.
The study takes as its test material a 350-page college-level textbook. In order to provide an experimental context, the authors created a set of 26 questions to pose to 26 undergraduate and graduate student experimental subjects. The objective was to determine which of three index techniques (using two automatically generated index sets and a modified treatment of the textbook’s index) the students preferred when they used the textbook, in an attempt to answer the 26 experimental questions.
The experimenters “compare[d] user preference for three sets of terms; one had been preconstructed by a human indexer, and two were identified automatically.” The two automatic methods created 7,980 terms (hierarchical search (HS) method) and 1,788 terms (TEC method), while the human created (presumably so, since the book’s index could have been generated by a combination of machine and human efforts) index had 673 terms in the index of the book. Not surprisingly, and despite a conscious attempt to “de-tune” the books index by eliminating bold-faced topical entries, the index was preferred by an eight to one ratio over the two automatic methods. This is not surprising, since the index terms, while perhaps originally generated by mechanical means, were presumably examined and organized for efficacy by the book’s authors and editors. Without human intervention, most machine generated word lists are less than satisfactory. For example, in a portable document format (PDF) version of this paper, an Adobe Search of “identified automatically” (a term from the abstract) found 18 references, but did not directly lead to a reference to how this automatic method was done. However, “automatically identified” found three references, one leading to the definition. This illustrates how a machine list creator is not sensitive to the semantics or organization of a work.
This paper brings back many memories. While at The University of Pennsylvania, I attacked a similar problem in my doctoral thesis, employing computing techniques in an attempt to extract some semantic relevance among search terms. Perhaps the authors’ next effort will delve into semantics, in an attempt to automatically create and organize an index set competitive with a human-generated index.