Computing Reviews

Data management in machine learning systems
Boehm M., Kumar A., Yang J., Morgan&Claypool Publishers,San Rafael, CA,2019. 174 pp.Type:Book
Date Reviewed: 01/10/20

Supervised machine learning (ML) systems need labeled datasets for training and testing models, and unsupervised ML systems need datasets for identifying hidden patterns. If the datasets for an application could be generated from an existing database, two approaches are possible: database systems could be modified to incorporate a learning environment, or a ML environment could be extended to incorporate a database system. This book offers interesting insights in this direction, having surveyed major initiatives existing in this narrow domain.

The book is organized into nine chapters. The first chapter narrates the motivation for the idea and scope of discussion. The second and third chapters discuss how ML features can be realized in database systems, including algorithms and learning over joins. Chapters 4 through 7 deal with database-integrated ML systems, covering aspects like logical and physical operator selection, execution strategies for such ML systems, memory management that also includes parallel architecture, as well as the cloud-based deployment of heterogeneous resources. The penultimate chapter surveys other tasks in the ML life cycle, ranging from data sourcing, data preparation, model selection, and model deployment. In the last chapter, the authors conclude the discussion, having analyzed the state of the art in ML-integrated database systems (or vice versa), user-defined functions, the specifications needed for high-level languages, optimization techniques, different ML models, heterogeneous data sources, and the life cycle management needed for such complex integrated systems.

The emergence of the Hadoop Distributed File System (HDFS) as an extension of a distributed database system catering to big-data-driven applications, to a certain extent, justifies the exploration of work integrating ML systems with database applications. The authors, however, do not explore in detail the emergence of highly specialized, sophisticated, and easy-to-use ML and deep learning frameworks like Keras, for instance.

Additionally, the types of datasets used in generic ML or deep learning environments are highly diversified when compared to those that could be used in these database-integrated learning systems. Nevertheless, the extensive surveys done on exiting integrated database-cum-learning environments, highlighting salient features as well as limitations, make this book extremely interesting for scholars working in this field and also product designers attempting to tap in to this niche domain.

Reviewer:  CK Raju Review #: CR146835 (2005-0100)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy