Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Handbook of data quality : research and practice
Sadiq S., Springer Publishing Company, Incorporated, New York, NY, 2013. 450 pp. Type: Book (978-3-642362-56-9)
Date Reviewed: May 19 2014

As the size and scope of databases and data repositories grow, they are becoming increasingly important drivers or justifications for public policy, legislation, and rulemaking. Commercial enterprises depend on their data repositories for business intelligence, customer service, marketing, and general competitive advantage. Decisions with significant consequences are made in diverse domains, such as evidence-based medicine and marine navigation, based on data whose quality is sometimes questionable. Health insurance exchanges and corporate compliance both depend on high-quality data. This collection by experts from academia and industry attempts to provide an integrated framework for data quality for all of these, and more, application domains and uses.

“Data quality” is regarded as resting on a three-part foundation consisting of organizational, architectural, and computational bases. The volume is accordingly divided into four main divisions--one for each basis and a fourth with case studies.

The first major division deals with the organizational basis, specifically objectives and strategies (roles, processes, policies, and standards) for enterprise data quality management. It begins by reviewing history, practice, successes, failures, and research issues. This is followed by a chapter on guidelines for enterprise projects and programs, discussing management strategy and skills, accompanied by two case summaries. A third chapter has guidelines for cost-benefit analysis of data quality efforts (this should be useful to practitioners charged with developing a value proposition for enterprises). The fourth chapter is a case study on the evolution and maturation of data governance and quality processes in which the authors point out that both external and internal factors must be considered.

The second major division deals with technology and system architectures. Data warehouse problems, factors, and solution approaches are described first, followed by a chapter on semantic web techniques in solution architectures. The main advantages of semantic web technologies for data quality are content integration, shared understanding, and more precise semantics, and the authors believe these techniques are mature enough for industrial projects (my own opinion is that at present they are better suited to exploratory projects in enterprises). Another chapter in this part provides an overview of a comprehensive data quality process and methods for quality assessment, measurement, and data cleaning. Literature references are provided, but the level of exposition here is relatively abstract and readers will need to be familiar with the characteristics and metrics of their own data domains in order to be able to put the ideas to practical use.

The third major division addresses computational foundations, describing techniques for solving common problems. The first chapter surveys methods of checking and enforcing integrity constraints and restoring consistency, with a focus is on declarative approaches. The next chapter addresses record linking, defined as identifying records (possibly from heterogeneous schemas) that refer to the same real-world entity. This chapter takes a procedural approach to the problem in describing algorithms for linking. The third chapter deals with a very similar problem, describing heuristics and techniques for entity resolution processes and identity management. The next chapter describes a model and cleaning methods for uncertain data, such as sensor data stores, which are susceptible to noise, measurement errors, limited bandwidth, and estimation errors. The following chapter is concerned with data fusion, in particular the resolution of conflicts between values from different sources.

The final division contains three case studies intended to provide concrete realizations of the organizational frameworks, architectural considerations, and computational approaches described in this book. The editor’s prologue includes a study of research literature and an analysis of the field. An epilogue by other experts summarizes the evolution of the field, roles, training, and outlook for the data quality profession.

The book is suitable for academics and students in computer science, information systems, and management. Practitioners should find matters related to their practice areas in one division or another. Different sectors of the audience will be more or less interested in different parts, with managers most interested in the organizational aspects, solution and system architects in architectures, and database experts and developers interested mainly in computational techniques. I find myself in agreement with the editor’s statement that the biggest advantage of reading the book is wider exposure to matters that may not be directly relevant to one’s particular research area.

The book is very readable, but, since it is a handbook, readers requiring details will need to consult the source literature. Professionals who have encountered data quality issues in particular domains, and are looking for a conceptual structure for what they have encountered, should find such a conceptual basis here. This book goes a long way toward providing an integrative view of data quality, and is a useful addition to the body of literature in this area.

Reviewer:  R. M. Malyankar Review #: CR142291 (1408-0624)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Content Analysis And Indexing (H.3.1 )
 
 
World Wide Web (WWW) (H.3.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Content Analysis And Indexing": Date
Personal bibliographic indexes and their computerisation
Heeks R., Taylor Graham Publishing, London, UK, 1986. Type: Book (9789780947568115)
Sep 1 1987
Development of a term association interface for browsing bibliographic data bases based on end users’ word associations
Pejtersen A., Olsen S., Zunde P., Taylor Graham Publishing, London, UK, 1987. Type: Book (9780947568306)
Nov 1 1989
Transforming text into hypertext for a compact disc encyclopedia
Glushko R. ACM SIGCHI Bulletin 20(SI): 293-298, 1989. Type: Article
May 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy