As the world begins to surf down the “trough of disillusionment” with big data [1], on the journey to sorting out what can be real and what has value, we rediscover the discipline that data and information scientists have been nurturing for quite some time: assessing the quality of what has been created and displayed. This domain of interest is front and center the minute you successfully integrate a variety of data sources, enriching and producing a novel set of analytics that was never before possible. After stepping back and marveling at this creation, someone rightful asks, “How do we measure the quality of the information we are working with?” This is at the heart of the paper.
The challenge with data quality frameworks is that they tend to not be practical. Todoran et al. highlight this, and it would seem that, in practice, these frameworks are often not needed for the simplicity of what many systems are aspiring to do. However, enter the world of high velocity, high volume, and high variety, and you actually need to understand entropy and resulting quality--this is where Todoran et al. are brilliant. Even if the specific examples are not relevant to you, they offer a three-step framework that is portable to approaching your specific situation. They have included tables for quality criteria and their measures for data and information, all of which provoke critical thinking about how to assess data quality as it flows through a system. One could even argue that their formula, well documented and exemplified, is not required in order to structure an information quality measurement strategy.
If you are involved in a new or existing data mashup, where it is not enough to have just any answer but an answer that comes with statistical transparency, the authors’ methodology will prove useful at a variety of levels.