In this big data era, numerous new techniques are emerging every day. It’s difficult to find a good book focusing on big data from a database aspect. Even inside the database community, it’s not easy to find such a generous book on big data and databases. Fortunately, this book is the one.
The author divides the book into two parts. The first part focuses on the fundamental part of big data and databases. The first chapter starts with the history of the database revolution. I really enjoyed this part because of the many historical details along with illustrative figures. Chapter 2 focuses on the big data initiation by Google. The author displays a big picture of how MapReduce works and details the platforms that support this diagram. Chapter 3 introduces NoSQL. During the big data era, demanding challenges appeared in the database community that led to the birth of NoSQL. From a transaction control aspect, the author introduces CAP theory and solutions, and then the NoSQL model is comprehensively presented.
Chapters 4 to 6 focus on different types of NoSQL databases, including document databases, graph databases, column-store databases, and in-memory databases. Each database category deals with a unique data type and functionality. This is not simply a general introduction; instead, many in-depth technical subjects are covered. The author also includes mainstream products for each database category with vivid examples, which is one of the reasons I like this book. One can get a bird’s-eye view of the diverse NoSQL community without hopping across multiple sources.
After the first part, the author introduces deeper information behind modern databases. Two major concerns of databases in distributed environments include data sharding/replication and data consistency, which are covered in chapters 8 and 9. The author vividly elaborates the various sharding and replication implementations in major big data databases including MongoDB, HBase, and Cassandra. For consistency, the consistent hashing gives me a lot of inspiration.
Chapters 10 and 11 give readers a big picture of data models/storage, query languages, and application programming interfaces (APIs). Chapter 12 increased my personal knowledge with its introduction to future database developments, such as quantum databases. The end of this book is comprised of an index of comprehensive database surveys, which is very beneficial for systematic study purposes.
The author has a deep background in modern databases from both industry and academic aspects. Many fundamental research papers in related fields are used in this book. It would be beneficial for readers if a reference list were included in a future edition.
In general, a book this good is not often seen. It will greatly benefit people from both industry and academia who study next-generation databases in this modern, big data era.
More reviews about this item: Amazon, Goodreads, i-Programmer