In their Harvard Business Review article at the end of 2012 [1], Davenport and Patil characterize data scientist as the sexiest job of this century; they argue that among the qualities of a data scientist is expertise in computer science and statistics. I would extend their argument and say that knowledge of data mining tasks for big data is eventually the principal quality of any data scientist.
This book is about educating and training the next generation of data mining people, those who will build new enterprises and move our knowledge one big step ahead.
The book is divided into four parts. Part 1’s chapters describe basic notions and background knowledge useful for building the advanced knowledge found in subsequent sections. In particular, these chapters present the concepts of numerical, categorical, graph, and high-dimensional data, along with useful statistical tools such as kernel methods and dimensionality reduction procedures, for example, singular value decomposition (SVD) and principal component analysis (PCA).
The second part deals with the issue of mining frequent patterns: patterns that emerge in set-based data, in sequence-based (sets with ordering) data, and in graph data.
The third part investigates the topic of clustering, explaining the basic algorithms for representative, hierarchical, density-based, spectral, and graph clustering. Finally, the last part of the book describes methods for classification, namely Bayes and decision tree classifiers, support vector machines (SVM), and linear discriminant analysis.
Despite the fact that there are several good books in the literature on data mining, this new one is really special. It manages to include all of the latest developments in the data mining area, along with those past ideas that have survived the test of time. The book does not overload the reader with many variants of classic algorithms just to cover in breadth all methods, but it presents only those algorithms that have been the roots of large families of algorithmic ideas. In this way, the book offers to the reader a deep comprehension of original thinking.
Overall, this is an excellent textbook for both undergraduate and postgraduate students, but it is also appropriate for scientists and engineers looking for solutions to their big data analysis problems. The solved and unsolved exercises in the book are carefully selected to enhance the reader’s understanding and to challenge him or her to further investigate the specific topic the exercise is about. I expect that the quality of the book will cause demanding data scientists to save a place for it in their hearts.
More reviews about this item: Amazon