Computing Reviews, the leading online review service for computing literature.

Search

On a model of distributed information retrieval systems based on thesauri
Mazur Z. Information Processing and Management: an International Journal20 (4):499-505,1984.Type:Article

Date Reviewed: Sep 1 1985

For each database in a document retrieval system, there is usually associated a thesaurus of terms which are used to index and retrieve the documents. The structure of thesauri is generally hierarchical; that is, there exists a “tree” of several levels of increasingly more specific terms. An important current problem in the field of information retrieval is the need to fashion retrieval techniques which will be useful in a network of heterogeneous databases where a different thesaurus may be associated with each database. In this situation, any two thesauri will have some, but not all, of the same terms and hierarchical relations. Mazur’s contribution is to develop a mathematical model of this situation with formalized definitions of such entities as an individual (local) retrieval system and its thesaurus, document collection, and query and retrieval sets. Futhermore, the formalization of a distributed system made up of a number of local systems is developed. In some simple situations, certain mathematical properties of the relationship between the local and distributed systems are derived. The author has made a nice start in formalizing this situation. Unfortunately, there are four major kinds of difficulties that need to be overcome before this kind of work can be truly useful. First, the mathematical descriptions and relationships have to be clearly associated with known and understandable features of retrieval systems. This is a question of good, interpretive exposition, and should be doable. The second difficulty is that modern retrieval systems are quite complicated. Besides indexing and searching by “controlled-vocabulary” terms from a thesaurus, there is free-vocabulary indexing (any word from titles and abstracts) and there is searching by masking, truncation, proximity, field specification, and weighting operations. The third difficulty is that even though the same terms superficially are used in indexing and searching, they may have different meanings in different contexts. The fourth difficulty is that the utility (and, therefore, cost effectiveness) of any system is bound up in retrieving relevant and useful documents. Both of these parameters, as well as the “meaning” one, are highly subjective and not easily captured with formalized, mathematical constructs. These last three difficulties pose formidable problems for the formalization and modeling of document retrieval systems, in general, and distributed systems, in particular.

Reviewer: R. S. Marcus	Review #: CR108904

Retrieval Models (H.3.3 ... )

Thesauruses (H.3.1 ... )

Would you recommend this review?

yes

Other reviews under "Retrieval Models":	Date

Evaluation of an inference network-based retrieval model Turtle H., Croft W. (ed) ACM Transactions on Information Systems 9(3): 187-222, 1991. Type: Article	May 1 1993

Information processing in linear vector space Kunz M. Information Processing and Management: an International Journal 20(4): 519-525, 1984. Type: Article	Mar 1 1985

Users and experts in the document retrieval system model Danilowicz C. International Journal of Man-Machine Studies 21(3): 245-252, 1984. Type: Article	May 1 1985

more...

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy