This edited book on mining massive text data stands out by putting the concept of a “cube” center stage: a multi-dimensional space in which massive text data analysis should take place. As such, it offers discussions at the crossroads of text data mining, natural language processing (NLP), knowledge engineering, and, to some extent, concepts from data warehouses predominantly used for the analysis of structured data.
The reader should bear in mind that the field of text analytics is currently dominated by so-called transformers, for example, BERT and GRT2. These have successfully managed to overcome the limitations posed by their predecessors, recurrent neural networks (RNNs), with regards to long-term dependencies between terms in sentences for text prediction tasks. This has been accomplished thanks to an attention mechanism, which is used to determine which parts of the text to focus on and give more weight.
Within this context, the book offers the view that we may need to look at the problem of mining massive text data through the lens of a multi-dimensional structure, the cube, which will provide different important dimensions as rich context information in text data. Questions, however, still remain: How can we extract these dimensions from text data, which will provide context rich information about the text? How can we exploit such a multi-dimensional space in order to improve natural language understanding? This is also reflected by the book’s structure, where the various chapters are classified into two main parts: Part 1 deals with the construction of the cube, and Part 2 deals with the exploitation of the cube in natural language understanding tasks.
Given the sheer importance of the field for businesses and organizations, which face surmounting amounts of data on a daily basis in the era of big data, the book definitely offers interesting perspectives. It is mostly suitable for master’s and PhD studies in the field, less so for developers. Most chapters (contributing articles) do require some mathematical background knowledge. Nonetheless, all are built around experiments and comparisons with other techniques; on the one hand, this structure does pay respect to the rigorous approaches for making claims and reaching conclusions; on the other hand, however, it is sometimes difficult to reproduce the experiments and results due to missing links and references.