Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Federated search
Shokouhi M., Si L. Foundations and Trends in Information Retrieval5 (1):1-102,2011.Type:Article
Date Reviewed: Jun 13 2014

Federated search is a mode of information retrieval where the system answers queries over independent document collections without using a centralized index. The process consists of selecting information sources that contain relevant information for a given user query, submitting individual queries to each relevant source, and merging the answers obtained from these individual queries. This is a challenging and relevant field of information retrieval, proposing a viable and decentralized alternative to a centralized search index. Even in the context of commercial search on the Web, there are limitations to using a centralized index to answer all kinds of queries, particularly over “hidden” web resources, which only expose their content through search interfaces. Distributed information retrieval techniques such as those discussed in this work are more relevant than ever. Moreover, the fundamental techniques discussed in this work apply in other domains, particularly the linked data and semantic web environments.

The work is divided into sections covering each of the main tasks. The section on collection representation discusses both cooperative environments, where the sources provide a summary of their collections, and uncooperative environments, where the federated search solution must learn the descriptions of the sources, usually though probing queries. Different kinds of metadata and how they help in query answering are also discussed in detail, together with the implications of different sampling techniques used for probing. This section also provides two very interesting discussions on how to estimate the true size of a remote collection and the representativeness of an existing sample.

The next section deals with the problem of selecting information sources for answering a given user query. Since not all sources are accessible at the same cost, carefully choosing where to send the user query is a very relevant effort. The section starts with the seminal methods extending the traditional bag-of-words model for solving this problem, progressing to more sophisticated models that use richer description and also machine learning techniques. Next, the authors address the task of consolidating individual answers from the sources into a single answer to be presented to the user. The discussion covers the merging of ranked results, data fusion (which combines different ranking algorithms), meta search engines, and duplicate detection (which enhances the quality of the results as perceived by the user).

The work pays special attention to the evaluation of techniques in the literature. Besides individual discussions at the end of each section, focused on the validation provided by the respective authors, the work also includes one section dedicated to the topic, covering test beds developed for the purpose of benchmarking distributed information retrieval systems.

Researchers and newcomers to the field will find this work an invaluable resource, providing an extensive and organized survey of existing methods discussed in a coherent way with unified terminology. The book concludes with a long list of references and discussions of several applications. Also of note to graduate students looking for a thesis topic, the work provides a critical discussion of the state of the art and the most pressing areas for future work in this area. The work is also accessible to a more general audience, requiring only some knowledge of the most fundamental information retrieval and web search concepts. In summary, the work is a timely, comprehensive, and very well-crafted introduction to the field of distributed information retrieval.

Reviewer:  Denilson Barbosa Review #: CR142395 (1409-0777)
Bookmark and Share
  Featured Reviewer  
 
Information Search And Retrieval (H.3.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Information Search And Retrieval": Date
Nested transactions in a combined IRS-DBMS architecture
Schek H. (ed)  Research and development in information retrieval (, King’s College, Cambridge,701984. Type: Proceedings
Nov 1 1985
An integrated fact/document information system for office automation
Ozkarahan E., Can F. (ed) Information Technology Research Development Applications 3(3): 142-156, 1984. Type: Article
Oct 1 1985
Access methods for text
Faloutsos C. ACM Computing Surveys 17(1): 49-74, 1985. Type: Article
Jan 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy