Volume 16 of a series titled “Studies in Big Data,” published by Springer, this book is composed of 13 chapters. Most contributions are from Europe, with two from Canada and one from China. There is also one contribution from a company from Palo Alto, CA. Each chapter is consistent within itself, but not connected logically to other chapters. The first chapter, written by editors Japkowicz and Stefanowski, makes an attempt to bring some coherence to the book’s chapters. The editors also contribute the last chapter, in which they continue their attempt to tie these chapters together under some coherent theme. They group the first four chapters under the “problem-centric” category and the remaining seven chapters under the “domain-centric” category. The domain-centric category is further divided into sectors of business, science and technology, and life sciences.
The two-page preface is written by the editors, and it is a dedication to Stan Matwin, who is the 2010 recipient of the distinguished service award from the Canadian Artificial Intelligence Society. The introduction (chapter 1) provides a very good summary of the book, with additional insights related to many aspects of big data and machine learning. Chapter 2, “An Insight on Big Data Analytics,” is an overview of the area with additional analysis of data mining versus statistical techniques. Chapter 3, “Toward Problem Solving Support Based on Big Data and Domain Knowledge: Interactive Granular Computing and Adaptive Judgment,” gives justice to its title. It is well written and comprehensive. In chapter 4, “An Overview of Concept Drift Applications,” the “concept drift” problem is discussed and various approaches to its study are presented.
The rest of the chapters fall under the “domain-centric” category. Chapter 5, “Analysis of Text-Enriched Heterogeneous Information Networks,” presents an in-depth analysis of information networks. “Implementing Big Data Analytics Projects in Business,” chapter 6, addresses known issues related to the business domain including the a new concept called “data lake.” Chapter 7, “Data Mining in Finance: Current Advances and Future Challenges,” is very informative, covering high-level broad issues directly related to finance.
Chapter 8, “Industrial-Scale Ad Hoc Risk Analytics Using MapReduce,” is a practically inclined chapter. It covers MapReduce intricacies with respect to risk control. Chapter 9, “Big Data and the Internet of Things,” must have been a difficult chapter to write due to the fact that the Internet of Things (IoT) is evolving very quickly, together with big data. Despite the existence of too many moving parts, the author does a good job presenting an intersection of these two evolving areas of research and development. I especially found the list of citation links in the appendix very useful.
Chapter 10, “Social Network Analysis in Streaming Call Graphs,” was difficult for me to follow. The authors state in their summary, “we report the burgeoning body of research in network sampling, visualization of streaming social networks, stream analysis, and the solutions proposed so far.” Chapter 11, “Scalable Cloud-Based Data Analysis Software Systems for Big Data from Next Generation Sequencing,” offers a concise and to-the-point discussion of a case study involving the SparkSeq system.
“Discovering Networks of Interdependent Features in High-Dimensional Problems,” chapter 12, was another difficult chapter for me due to the fact that the authors tried to explain their research in a relatively short amount of space. It seems like the allotted space was too short to explain the topic to a nonspecialist in the area. Chapter 13, “Final Remarks on Big Data Analysis and Its Impact on Society and Science,” developed by the editors, is composed of lessons learned sections and a tribute to Stan Matwin, who influenced their research and development.
In conclusion, this book can be used as a reference book on big data analysis with a tilt toward machine learning techniques. Individual chapters could be useful to interested parties in the respective areas of research. Big data analytics is a relatively new problem in the domain of civilian activities, although it has a longer history in military applications. This edited volume certainly fills a niche in this growing and broad research area. I believe that the claim of fostering dialogue among researchers from different fields and perspectives might be achieved with this volume.