Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
A collaborative approach to build evaluated Web page datasets
Barros R., Rodrigues Nt J., Xexéo G., de Souza J. Future Generation Computer Systems27 (1):119-126,2011.Type:Article
Date Reviewed: Oct 20 2011

In their paper, Barros et al. propose a method of collaboratively collecting Web pages of good quality to be used for studying information retrieval algorithms that require large datasets.

The traditional way of collecting Web pages by crawling the Web following a set of seeding uniform resource locators (URLs) is expensive; this method also makes it difficult to preserve the quality of the pages collected. The authors’ proposed method uses a filtering process to weed out low-quality pages in a given context (topic). The filtering process decides the quality of a Web page in three dimensions: completeness, reputation, and timeliness. The quality is determined by examining metadata, such as dates, number of back-links and forward-links, the page’s “authority and hub scores,” and other factors. The filtering process works as follows. First, Web pages are collected. These pages then are fed through an automatic evaluation process using a six-step process: metadata derivation, fuzzification, definition of “SQER (single quality evaluation results), definition of CQD, calculation of CQER (composed quality evaluation results), and defuzzification.” The results are then evaluated by human coordinators and evaluators. The coordinator collects the scores from various evaluators and formulates a binary decision about whether or not a Web page is relevant to the topic by taking the median value of the evaluations.

The authors presented the result of a small-scale, proof-of-concept study using their approach. The coordinator first decided the context (“economy,” in this case) and assigned three quality dimensions different degrees of importance. Seeding Web pages were selected, and then a predetermined number of Web pages (500) were crawled. The pages were then fed through the automatic evaluation process, and then through the manual evaluation. The results show that both recall and precision increased for the queries applied to the collected dataset.

Reviewer:  Xiannong Meng Review #: CR139512 (1203-0305)
Bookmark and Share
  Featured Reviewer  
 
Information Search And Retrieval (H.3.3 )
 
 
Collaborative Computing (H.5.3 ... )
 
 
Data Mining (H.2.8 ... )
 
 
World Wide Web (WWW) (H.3.4 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Information Search And Retrieval": Date
Nested transactions in a combined IRS-DBMS architecture
Schek H. (ed)  Research and development in information retrieval (, King’s College, Cambridge,701984. Type: Proceedings
Nov 1 1985
An integrated fact/document information system for office automation
Ozkarahan E., Can F. (ed) Information Technology Research Development Applications 3(3): 142-156, 1984. Type: Article
Oct 1 1985
Access methods for text
Faloutsos C. ACM Computing Surveys 17(1): 49-74, 1985. Type: Article
Jan 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy