Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Web information retrieval
Ceri S., Bozzon A., Brambilla M., Della Valle E., Fraternali P., Quarteroni S., Springer Publishing Company, Incorporated, 2013. 296 pp. Type: Book (978-3-642393-13-6)
Date Reviewed: Jul 29 2014

As web search systems increasingly become a part of daily life, the need for developing and maintaining these systems is growing rapidly. Educating the next generation of web search systems developers is an important duty of the computing and information systems fields. However, few existing materials present a comprehensive yet concise description of web search topics. The technical, algorithmic, economic, social, and behavioral aspects need to be addressed.

This book addresses these needs by presenting the concepts and techniques of web information retrieval (IR) in a compact and comprehensive manner. Grown out of a research project funded by the European Research Council (ERC) between 2008 and 2013, the book introduces IR concepts and technologies in the context of web search, and presents recent advances in web IR. Written for a short course, the book is designed to satisfy the instructional needs for introducing fundamental IR concepts and for examining recent web-specific aspects.

This book has three parts. Part 1, “Principles of Information Retrieval,” introduces basic concepts and techniques in IR. Chapter 1 defines the discipline of IR and introduces a formal IR model and IR systems evaluation methods. Chapter 2 provides a high-level overview of an IR system and explains the document indexing process. Boolean, vector space, and probabilistic IR models are introduced in chapter 3. In chapter 4, the authors present automatic classification and clustering methods and describe some real-world applications. Chapter 5 introduces natural language processing (NLP), describes language modeling using hidden Markov models (HMMs) and conditional random fields (CRFs), and presents question-answering systems as an NLP application.

Part 2, “Information Retrieval for the Web,” addresses the operational aspects and business models of search engines. Chapter 6 provides a brief history of search engines and explains the two major processes in search engine development: crawling and indexing. A link analysis that search engines use to determine the ranking of web pages is explained in chapter 7. As major link analysis techniques, the chapter describes the PageRank and hyperlink-induced topic search (HITS) algorithms. Chapter 8 introduces recommender systems and two major techniques: content-based and collaborative filtering. The authors describe online advertising strategies and discuss economic models and auction mechanisms in chapter 9.

Part 3, “Advanced Aspects of Web Search,” consists of six chapters. In chapter 10, the authors explain different data formats used for web publishing. Chapter 11 introduces meta-search and multi-domain search, the latter being part of the authors’ research project. Commercial semantic search services and several academic prototypes are presented in chapter 12. Chapter 13 overviews multimedia search and describes the architecture of and research projects on multimedia IR systems. Chapter 14 examines information seeking paradigms and reviews existing works on user interfaces for search. In chapter 15, the authors introduce the discipline of human computation, review existing research projects, and highlight open issues.

The writing is generally clear, with occasional mistakes. For example, the Teoma search engine is misspelled as Taoma (p. 102). Teoma was actually acquired by Ask Jeeves in 2001 and redirected to Ask.com in 2006. The HITS algorithm is incorrectly described as “hypertext-induced topic search” (p. 101); the “H” actually stands for “hyperlink.” The terms “recommending systems,” “recommendation systems,” and “recommender systems” all appear in the book but refer to the same thing.

To benefit educational users further, the book could include more worked examples within each chapter and more end-of-chapter questions. Currently, each chapter has only a few end-of-chapter exercises; these need to be expanded for students who need more practice. In comparison, Manning et al.’s book [1] includes many more worked examples and exercises.

The organization of the book is fairly consistent: an abstract is provided at the beginning of each chapter, followed by an introduction to the topic and explanation of key techniques, and then included exercises to reinforce student learning. The impressive list of 374 references and the supplementary website (http://www.search-computing.org/web-information-retrieval-book) should inform researchers and educators.

The consistency of presentation could be improved. Chapters that present algorithmic details (for example, chapters 3, 4, and 7) could benefit from more discussion of examples and applications, whereas chapters focusing on high-level explanations and literature reviews (for example, chapters 8 through 10) should include more algorithmic details. Chapter summaries and objectives, if present, could enhance the readability and usefulness of the book as an educational resource. Multilingual web IR (the Communications of the ACM (CACM) cover story in May 2008 [2]) and social media analytics (a CACM featured article in June 2014 [3]) should be added as separate chapters in this book. A final conclusion and list of future directions, if present, could help researchers identify trends.

Overall, this book is a valuable resource for students and instructors in web IR, primarily as a reference to supplement course teaching. Researchers and practitioners should find the book a useful quick reference guide for key concepts, techniques, and recent trends in web IR.

Reviewer:  Wingyan Chung Review #: CR142561 (1410-0829)
1) Manning, C. D.; Raghavan, P.; Schütze, H. Introduction to information retrieval. Cambridge University Press, New York, NY, 2008.
2) Chung, W. Web searching in a multilingual world. Communications of the ACM 51, 5(2008), 32–40.
3) Fan, W.; Gordon, M. D. The power of social media analytics. Communications of the ACM 57, 6(2014), 74–81.
Bookmark and Share
  Reviewer Selected
 
 
Information Search And Retrieval (H.3.3 )
 
 
World Wide Web (WWW) (H.3.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Information Search And Retrieval": Date
Nested transactions in a combined IRS-DBMS architecture
Schek H. (ed)  Research and development in information retrieval (, King’s College, Cambridge,701984. Type: Proceedings
Nov 1 1985
An integrated fact/document information system for office automation
Ozkarahan E., Can F. (ed) Information Technology Research Development Applications 3(3): 142-156, 1984. Type: Article
Oct 1 1985
Access methods for text
Faloutsos C. ACM Computing Surveys 17(1): 49-74, 1985. Type: Article
Jan 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy