Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Principles of data integration
Doan A., Halevy A., Ives Z., Morgan Kaufmann Publishers Inc., Waltham, MA, 2012. 520 pp. Type: Book (978-0-124160-44-6)
Date Reviewed: Oct 4 2013

Integrating data from different sources is a challenging task. Although the subject has been researched since the ’80s, it continues to be a hot research topic nowadays, partly due to the ever-increasing availability of heterogeneous data sources and new demands. This interesting book on data integration covers a wide range of topics, including data integration in the enterprise, data integration on the web, and scientific data integration.

The book consists of three parts and 19 chapters. After an overview of the topic of data integration in chapter 1, Part 1, “Foundational Data Integration Techniques” (chapters 2 to 10), discusses fundamental issues about data integration. Chapter 2, “Manipulating Query Expressions,” focuses on algorithms to manipulate and reason about query expressions (for example, to detect query containment or detect if a view is relevant to a given query). Chapter 3, “Describing Data Sources,” tackles schema mappings, including global-as-view (GAV), local-as-view (LAV), and global-and-local-as-view (GLAV). This chapter also considers techniques to handle the possible incompleteness of data sources and limitations to accessing the data. Chapter 4, “String Matching,” tackles the problem of determining when strings refer to the same real-world entity, using similarity measures. Chapter 5, “Schema Matching and Mapping,” presents heuristics and machine learning methods to create semantic mappings (query expressions that relate a schema with another one). Chapter 6, “General Schema Manipulation Operators,” presents operators that compare and compose schemas and mappings (for example, to merge two schemas into a single one, or to translate a schema from one data model to another).

Chapter 7, “Data Matching,” discusses techniques to find matches between data instances (structured items that correspond to the same entity), including rule-based approaches, approaches based on supervised learning, clustering approaches, probabilistic approaches, and collective matching (where the correlation between matching decisions is exploited to improve the matching accuracy). Chapter 8, “Query Processing,” emphasizes the interest of adaptive query processing, where query plans can be changed during query execution depending on observed events and performance. Chapter 9, “Wrappers,” focuses on the problem of information extraction from the web, covering different groups of solutions for wrapper construction: manual, learning-based, automatic, and interactive. Chapter 10, “Data Warehousing and Caching,” discusses data warehousing (which, as opposed to virtual data integration techniques, is based on materialization), including data exchange techniques (data warehousing with declarative mappings).

Part 2, “Integration with Extended Data Representations” (chapters 11 to 14), considers features beyond the relational data model. Chapter 11, “XML,” focuses on the use of XML as an enabler for data sharing. It covers the basics of the data model, document type definitions (DTDs), XML schema, query languages (XPath and XQuery), query processing, and the specific complexities that XML introduces for schema mapping. Chapter 12, “Ontologies and Knowledge Representation,” considers the use of knowledge representation in data integration, including an overview of description logics and the semantic web vision (resource description framework (RDF), RDF schema (RDFS), web ontology language (OWL), and SPARQL). Chapter 13, “Incorporating Uncertainty into Data Integration,” presents the problem of uncertainty management in data integration systems. Chapter 14, “Data Provenance,” is about annotating data with information indicating how the data was generated.

Part 3, “Novel Integration Architectures” (chapters 15 to 19), tackles data integration problems in specific contexts. Chapter 15, “Data Integration on the Web,” focuses on web data, the deep web, topical portals, and mashups. Chapter 16, “Keyword Search: Integration on Demand,” focuses on the problem of keyword-based searching on structured data sources, providing ranked results, and the challenges of keyword search in data integration scenarios. Chapter 17, “Peer-to-Peer Integration,” describes a peer-to-peer (P2P) architecture for data sharing, based on the use of peer mappings (describing semantic relationships between peers). Chapter 18, “Integration in Support of Collaboration,” focuses on collaborative systems where users should be allowed to update and annotate the shared data. Finally, chapter 19, “The Future of Data Integration,” discusses some open problems and lines for future work.

This is a very relevant and comprehensive reference on the topic of data integration, which could be used as a text in a data management course or as a reference for researchers working in the field. The book has a didactic style and includes good examples and summaries that will help the reader quickly get an overall picture of the various topics. In addition, each chapter includes a section with useful bibliographic notes. Data integration is a complex topic, so these references expand the coverage of each chapter and point to detailed explanations about the issues covered. Because the author is focused on principles, the book does not identify specific existing tools and systems that could be used for data integration, and there are no exercises included. However, a set of exercises, along with some slides, can be accessed as supplementary electronic material on the publisher’s website (http://www.elsevierdirect.com/v2/companion.jsp?ISBN=9780124160446).

More reviews about this item: Amazon

Reviewer:  Sergio Ilarri Review #: CR141614 (1312-1072)
Bookmark and Share
  Reviewer Selected
 
 
Database Applications (H.2.8 )
 
 
Query Processing (H.2.4 ... )
 
 
Heterogeneous Databases (H.2.5 )
 
 
Information Storage And Retrieval (H.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Database Applications": Date
Databases for genetic services: current usages and future directions
Meaney F. Journal of Medical Systems 11(2-3): 227-232, 1987. Type: Article
Sep 1 1988
Database applications using Prolog
Lucas R., Halsted Press, New York, NY, 1988. Type: Book (9789780470211663)
Aug 1 1990
Oracle’s cooperative development environment
Kline K., Butterworth-Heinemann, Newton, MA, 1995. Type: Book (9780750695008)
May 1 1996
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy