Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Data and information quality : dimensions, principles and techniques
Batini C., Scannapieco M., Springer International Publishing, New York, NY, 2016. 500 pp. Type: Book (978-3-319241-04-3)
Date Reviewed: Oct 13 2016

Quality of data and information is relevant to both decisional and operational processes. Poor quality can have serious impacts on the efficiency and effectiveness of organizations and enterprises. The inexorable march to what is being called the digital transformation would seem to magnify the importance of data and information quality. Poor data quality and information that is really not fit for its intended use will not be acceptable. This book addresses the dimensions, principles, and techniques to ensure that data and information conform to the necessary quality requirements.

The book distinguishes between the use of the terms “data” and “information.” Data is used to refer to structured data in a database or part of a generally accepted compound term, such as data model, linked open data, web data, or big data. Information refers to all other information types (except linked open data and big data). Data quality (DQ) therefore refers to the dimensions, principles, and techniques related only to structured data; information quality (IQ) refers to all other information types. Please note that quality is about more than accuracy. Other significant dimensions include completeness, consistency, and currency.

The book describes three types of information: structured information that uses a tightly coupled schema, such as in a relational database; semistructured “information that is either partially structured or has a descriptive rather than prescriptive schema,” such as an Extensible Markup Language (XML) record; and unstructured information that has “no semantics induced by an explicit schema.”

According to the authors, “The first two chapters are preparatory to the rest of the book.” Chapter 1 provides an introduction to the ideas and concepts surrounding information quality. Chapter 2 examines the issue of data quality dimensions.

Chapter 3 covers information quality dimensions for maps and texts, and chapter 5 discusses quality of images. In between, chapter 4 examines data quality issues in linked open data. Linked data is a standard way of sharing, exposing, and connecting knowledge, data, and information on the semantic Web. Linked open data is open content linked data. The chapter discusses “in detail IQ dimensions and their respective metrics” for three areas: web information systems, the semantic web, and relational databases.

Chapter 6 discusses models for information quality, such as an entity-relationship model for structured data and the data and data quality model for semistructured data. Chapter 7 explores information quality activities, which “is any process [performed] directly on information to improve [its] quality,” such as error localization and correction. The authors deem object identification as the most important information quality activity and devote two chapters to the subject. Chapter 8 covers a wealth of topics regarding object identification, including distance-based comparison functions, sorted neighborhood method and extensions, and knowledge-based techniques. Chapter 9 discusses recent advances in object identification, including learnable, adaptable, and context-based reduction techniques.

Data quality issues in data integration systems are the subject of chapter 10. Data integration presents unified data from heterogeneous and distributed data sources. The two data quality issues with respect to data integration are around quality-driven query processing and instance-level conflict resolution.

The final chapters focus on information quality in use. Chapter 11 addresses such topics as the relationship between quality and the utility of information, as well as a cost-benefit analysis of information quality improvement initiatives. Chapter 12 reviews methodologies for information quality assessment and improvement. Because of its importance, a brief introduction to healthcare information quality is the subject of chapter 13. The book concludes with chapter 14, addressing the open problem on the quality of web data and big data.

The authors define the primary target audience as researchers in the fields of databases and information management, but the audiences should go far beyond that. Information and communication technology (ICT) professionals who touch in any way upon data and information quality (and that includes both operations and systems development personnel) should find this book mandatory reading. In addition, although the book is not intended to be a textbook, its serious depth and breadth would seem to merit building an advanced course on data and information quality around it, so computer science students would be yet another audience. Any of these audiences should find the book relevant and important to their efforts with regard to data and information quality activities.

Reviewer:  David G. Hill Review #: CR144839 (1701-0021)
Bookmark and Share
  Featured Reviewer  
 
Content Analysis And Indexing (H.3.1 )
 
 
General (H.2.0 )
 
 
Information Search And Retrieval (H.3.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Content Analysis And Indexing": Date
Personal bibliographic indexes and their computerisation
Heeks R., Taylor Graham Publishing, London, UK, 1986. Type: Book (9789780947568115)
Sep 1 1987
Development of a term association interface for browsing bibliographic data bases based on end users’ word associations
Pejtersen A., Olsen S., Zunde P., Taylor Graham Publishing, London, UK, 1987. Type: Book (9780947568306)
Nov 1 1989
Transforming text into hypertext for a compact disc encyclopedia
Glushko R. ACM SIGCHI Bulletin 20(SI): 293-298, 1989. Type: Article
May 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy