Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Big data integration theory : theory and methods of database mappings, programming languages, and semantics
Majkić Z., Springer Publishing Company, Incorporated, New York, NY, 2014. 473 pp. Type: Book (978-3-319041-55-1)
Date Reviewed: Nov 10 2014

“Big data” is a catchy term used to refer to high-volume, high-velocity information assets, either structured or unstructured, that appear in a huge variety of forms. The idea behind big data is that large bodies of data would produce results that smaller amounts would not. Examples are massive datasets from meteorology, astrophysics, Internet search engines, and automatic translation systems. Although there is certainly an exaggeration behind the hope that big data can produce knowledge in a faster, more accurate, or cheaper way than traditional analysis, as discussed below, the problem of dealing mathematically with big data by means of relational databases does represent a real challenge for logic. Although there exist well-studied relational models for handling databases based on first-order predicate logic, this book starts from the view that the huge complexity of big data requires a new, higher-level theoretical and algebraic framework, in a certain sense generalizing the relational model. This framework encompasses a kind of second-order logic based on the idea of tuple-generating methodology dependencies (tgds), and intends to provide denotational and operational semantics of data integration by means of category theory, an abstract approach to formalizing mathematical concepts as collections of objects with arrows relating them (such arrows are also called morphisms). Category theory can be seen as a common generalization of abstract concepts such as algebraic structures, topological structures, and even set theory.

Much work has been devoted in recent decades to investigating the categorical analyses of structures related to logics. A comprehensive theory of big data would require a generalization of the first-order relational models, and categorical logic (a particular branch of category theory more connected to theoretical computer science) is one such generalization. The categorical framework approach to big data underpinning this book seems naturally justifiable.

Indeed, the categorical approach to big data has some appealing properties, as can be noted when versed in the dialect of category theory: it defines a so-called V-category, which is self-dual, complete and comprehensive co-complete, locally small and locally finitely presentable, and monoidal biclosed. What all this means, however, is not immediately accessible to readers without a deep knowledge of theoretical computer science, category theory, and (second-order) logic. Without such a background, the reader will not find it easy to construct database mappings and queries over databases in a graphical form, as encouraged in the book. A brief enumeration of the chapters will give a hint of what appears in the monograph: “Introduction and Technical Preliminaries”; Composition of Schema Mappings: Syntax and Semantics“; ”Definition of DB Category“; ”Functorial Semantics for Database Schema Mappings“; ”Extensions of Relational Codd’s Algebra and DB Category“; ”Categorial RDB Machines“; ”Operational Semantics for Database Mappings“; ”The Properties of DB Category“; and ”Weak Monoidal DB Topos.“ Many specific references are given at the end of each chapter.

The inherent difficulties, however, do not make this book less useful or relevant. Besides any easiness of use or understanding, a theory for big data requires a good foundation, and this is attained by this book, albeit not necessarily in the simplest way. But the reader would be aware of the limitations and dangers inherent in the enthusiasm for big data, which basically makes decisions based on statistical correlations, not on theories. A category theory foundation of big data, as provided here, will help to insert big data into a high-level theoretical structure, but not to decide which correlations are relevant or meaningful. A paradigmatic, well-known example of this limitation is Google Flu Trends (GFT), a web service operated by Google, taken in the beginning to be an exemplary use of big data, but which ended up as a total vexation.

This book does not approach this issue, and perhaps it could be argued that it should not. But there is an aspect that I see as missing in the categorical view of big data. When integrating data coming from multiple different sources, new databases are faced with the possibility of contradictions and incongruences. This problem is vastly recognized in the literature: different logical frameworks for dealing with inconsistency (or contradictions) in database reasoning have been proposed by a large number of authors, basically connected to paraconsistent logics (see references below, and inside references therein). This failure does not jeopardize the treatment of big data attempted in this book, but it certainly leaves an opportunity for future research.

To sum up, this is a courageous book, which represents the personal endeavor of its author toward an integrated theory for big data, relating databases with sophisticated methods of logic and computer science. This volume, with more than 500 pages, will be a respectful addition to the library of any serious professional devoted to computer science.

Reviewer:  Walter Carnielli Review #: CR142919 (1502-0124)
1) de Amo, S.; Sakuray, M. A paraconsistent logic programming approach for querying inconsistent databases. International Journal of Approximate Reasoning 46, 2(2007), 366–386.
2) Carnielli, W. A.; Marcos, J.; de Amo, S. Formal inconsistency and evolutionary databases. Logic and Logical Philosophy 8 (2000), 115–152.
3) Arieli, O. Paraconsistent declarative semantics for extended logic programs. Annals of Mathematics and Artificial Intelligence 36, 4(2002), 381–417.
4) Kifer, M.; Lozinskii, E. L. A logic for reasoning with inconsistency. Journal of Automated Reasoning 9, 2(1992), 179–215.
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Logical Design (H.2.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Logical Design": Date
Database analysis and design (2nd ed.)
Hawryszkiewycz I., Macmillan Press Ltd., Basingstoke, UK, 1991. Type: Book (9780023518515)
Jul 1 1993
The semantics of incomplete databases as an expression of preferences
Royer V. Theoretical Computer Science 78(1): 113-136, 1991. Type: Article
Mar 1 1992
A note on lossless database decompositions
Vardi M. Information Processing Letters 18(5): 257-260, 1984. Type: Article
Oct 1 1985
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy