Computing Reviews

Diverse and proportional size-l object summaries using pairwise relevance
Fakas G., Cai Z., Mamoulis N. The VLDB Journal: The International Journal on Very Large Data Bases25(6):791-816,2016.Type:Article
Date Reviewed: 03/21/17

When we search the web, we usually enter one or more keywords; in (almost) no time, we receive back a list of websites that bear some relation to those keywords. Behind the scenes, a search engine builds a set of links to web sites arranged in a tree. In the literature, these search output results have long been known as data subjects (DSs) and summaries of their properties are known as object summaries (OSs). OSs are trees with the keyword list as the root and each OS as a node, and they must be built above all with relevance of the web sites to the initial search parameters in mind. Of course, several techniques already exist that do this. This paper proposes two new such techniques, which result in two new kinds of OSs: DSize-l OSs and PSize-l OSs.

DSize-l OSs address diversity (content of sites should be reasonably different), while PSize-l OSs address proportionality (sites with content similar but not identical to the initial parameters should also be present in the results); they are described both in mathematical terms and in the form of code snippets. The paper then makes clear that these objects do not constitute a new way to search the web; they are only a new way to evaluate sites in relation to the search parameters. As such, the authors apply them to different kinds of site-searching algorithms: first to a brute-force one, then to a greedy one, and then to preprocessing techniques that can be themselves applied to a greedy algorithm. Evaluation of these techniques follows in terms of experimental results: features evaluated are effectiveness, or the ability to effectively help humans find what they are looking for; quality of nodes, or the relevance of nodes to initial search parameters; and efficiency, or how much computational resources are used by these searches. These evaluations are performed by running the algorithms against Google+ searches and DBLP bibliographic datasets. A final section of the paper points to future work and also gives exhaustive references to books and articles already published. What stands out when reading this paper is its clear nature as a bridge between past and future: it builds on existing algorithms, but applies them with new techniques to enhance their results.

Reviewer:  Andrea Paramithiotti Review #: CR145132 (1706-0390)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy