“Machine learning,” “data analytics,” “pattern recognition,” “data mining,” “knowledge discovery in databases,” and “artificial intelligence” are terms that are often used interchangeably when it comes to finding patterns and regularities in data. Although as technologies these are distinguished, they have a common necessity in order to perform their tasks: data availability. However, raw data is usually not suitable for use in many cases and first needs to be turned into features.
The mission of this book is to present concepts and technical aspects of feature engineering. This is particularly relevant given the tremendous level of activity around machine learning (ML) applications, mainly because the performance of such applications depends on the quality of the available features.
Beyond a preliminary chapter that introduces basic concepts related to feature engineering, the remaining content is grouped into three parts. Part 1, “Feature Engineering for Various Data Types,” covers feature representation techniques for six data types: text, visual, time series, streams, sequential, and networked data. Each data type is explored in a dedicated chapter, and each includes classical feature engineering approaches, where features are human designed, to more recent advances, where features are learned from data. It is important to highlight that the book does a good job of describing each data type using a certain formalism.
Part 2, “General Feature Engineering Techniques,” deals with feature selection, analysis and evaluation, and automatic feature generation. Feature selection is a key task for any ML application and has a long history in academic research. This part of the book reviews recent advances in the field. The next three chapters present techniques for automatic feature generation. Usually, feature generation requires human attention, so most times the costs to have reasonable features are high, which makes this part especially interesting for anyone interested in data analytics and willing to work on applied ML.
The last part of the book, “Feature Engineering in Special Applications,” discusses how researchers are engineering features to detect social bots, to predict bugs in software, and for several case studies based on Twitter data. At first glance, this part seems to be just applications of known feature engineering techniques, and it is indeed. But readers may be surprised by how people have to slog to extract features in unfamiliar territory.
For the most part, the book thoroughly explains details with the utmost clarity. Some readers may find it challenging given the level of detail. It is written for data scientists with an intermediate level of experience. ML practitioners who wish to consolidate their knowledge will appreciate it, and researchers will surely recognize its worth.