The authors present their enhancements to spatial keyword queries by using probabilistic topic modeling to incorporate semantic information. The topic model is based on latent Dirichlet allocation (LDA), which performs a statistical analysis to derive the semantic relevance of a topic to the relevant words in a set of documents. While this approach improves the quality of the search results, it can greatly increase the search space in which spatial objects have to be located: the combination of spatial aspects (reflecting the location of relevant objects on a map of the real world) and topical aspects (reflecting the similarity of the relevant objects to the search terms) leads to a high-dimensional search space, often characterized as the “curse of dimensionality.”
To overcome this, the authors developed a specialized indexing structure (LHQ-tree) in combination with efficient search algorithms. The LHQ-tree is a combination of tree structures commonly used for spatial arrangements (quadtree), approximate string matching (MHR-tree), and high-dimensional similarity search (NIQ-tree). Based on the spatial, topical, and textual layers of the tree, the algorithm identifies a candidate set of objects that are nearby and that match the intended meaning of the search terms.
Although the authors conducted extensive experiments comparing their approach against related methods and include a detailed discussion of the results with respect to several parameters of the index and algorithm, there is no overall assessment. At a conceptual level, the authors neglect to discuss the integration of statistical techniques with lexical databases like WordNet or ontology-based approaches [1,2,3]. Even if it is beyond the scope of their implementation and experiments, these human-constructed semantic models provide another angle toward the improvement of spatial search, and they should be mentioned in the related work section. As a reader, I was impressed by the technical details of the proposed combination of spatial queries with topic models, but not quite convinced that it fully captures the integration of semantic search aspects.