Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Diverse and proportional size-l object summaries using pairwise relevance
Fakas G., Cai Z., Mamoulis N. The VLDB Journal: The International Journal on Very Large Data Bases25 (6):791-816,2016.Type:Article
Date Reviewed: Mar 21 2017

When we search the web, we usually enter one or more keywords; in (almost) no time, we receive back a list of websites that bear some relation to those keywords. Behind the scenes, a search engine builds a set of links to web sites arranged in a tree. In the literature, these search output results have long been known as data subjects (DSs) and summaries of their properties are known as object summaries (OSs). OSs are trees with the keyword list as the root and each OS as a node, and they must be built above all with relevance of the web sites to the initial search parameters in mind. Of course, several techniques already exist that do this. This paper proposes two new such techniques, which result in two new kinds of OSs: DSize-l OSs and PSize-l OSs.

DSize-l OSs address diversity (content of sites should be reasonably different), while PSize-l OSs address proportionality (sites with content similar but not identical to the initial parameters should also be present in the results); they are described both in mathematical terms and in the form of code snippets. The paper then makes clear that these objects do not constitute a new way to search the web; they are only a new way to evaluate sites in relation to the search parameters. As such, the authors apply them to different kinds of site-searching algorithms: first to a brute-force one, then to a greedy one, and then to preprocessing techniques that can be themselves applied to a greedy algorithm. Evaluation of these techniques follows in terms of experimental results: features evaluated are effectiveness, or the ability to effectively help humans find what they are looking for; quality of nodes, or the relevance of nodes to initial search parameters; and efficiency, or how much computational resources are used by these searches. These evaluations are performed by running the algorithms against Google+ searches and DBLP bibliographic datasets. A final section of the paper points to future work and also gives exhaustive references to books and articles already published. What stands out when reading this paper is its clear nature as a bridge between past and future: it builds on existing algorithms, but applies them with new techniques to enhance their results.

Reviewer:  Andrea Paramithiotti Review #: CR145132 (1706-0390)
Bookmark and Share
  Featured Reviewer  
 
Information Search And Retrieval (H.3.3 )
 
Would you recommend this review?
yes
no
Other reviews under "Information Search And Retrieval": Date
Nested transactions in a combined IRS-DBMS architecture
Schek H. (ed)  Research and development in information retrieval (, King’s College, Cambridge,701984. Type: Proceedings
Nov 1 1985
An integrated fact/document information system for office automation
Ozkarahan E., Can F. (ed) Information Technology Research Development Applications 3(3): 142-156, 1984. Type: Article
Oct 1 1985
Access methods for text
Faloutsos C. ACM Computing Surveys 17(1): 49-74, 1985. Type: Article
Jan 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy