Artificial intelligence is experiencing a new golden age with the use of powerful machine learning tools that are able to process massive amounts of data. However, many of these tools produce systems that are seen as a jumble of weights or probabilities. Extracting insight and fully understanding how the trained system works is becoming so challenging that it resembles the quagmire of understanding how the brain works.
It is therefore comforting to see that there are significant efforts to try to render some comprehensibility from complex structures and highly dimensional spaces. In this paper, linguistic terms are used to express relative properties that are extracted from underlying multidimensional spaces, the so-called “conceptual spaces,” and the semantic relations (and the degrees between two terms or significant directions) are evaluated and compared with those that humans would extract, using crowdsourcing experiments.
The techniques are illustrated with the classification of documents from text corpora. Somewhat paradoxically, the data are transformed from the original vector space to a similarity-based matrix (using angular differences), then to a reduced space where a Euclidean distance is used, and once again through similarities and directions.
It is revealing not only how many techniques are needed in this paper, but also how frequently the simplest or oldest techniques (multidimensional scaling, Euclidean spaces, and so on) work better than more sophisticated ones. Nevertheless, the integration of so many techniques obscures the lessons learned and hinders reusability, despite the fact that some combinations (FOIL, the first-order inductive learner of the 1990s, over the semantic relations) may have a wide range of applicability and theoretical analysis.