Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Content-based trust and bias classification via biclustering
Siklósi D., Daróczy B., Benczúr A.  WebQuality 2012 (Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality, Lyon, France, Apr 16, 2012)41-47.2012.Type:Proceedings
Date Reviewed: May 25 2012

This paper aims to improve trust in a large collection of Web data, given a specific domain. Obtaining a large collection of opinions from unbiased and trustworthy sources is a challenge. The authors focus on assessing trust and neutrality by selecting relevant and trustworthy corpora. In order to evaluate trust, a credibility score among multiple sources of content is computed. This is later used to compute a utility score. Trust refers to a wide variety of aspects of a Web site, including those that might make a user distracted or those that contain famous authoritative sources. Neutrality refers to opinions that are based on facts. Oh the other hand, bias relies on assaults and dishonest opinions that do not relate to facts.

The approach is three-fold: compile approximately 1,000 bags of concepts from words via biclustering; use feature selection and weighting in the training set; and use the support vector machine (SVM) classifier to apply late fusion.

The dataset used in this evaluation was created for the ECML/PKDD Knowledge Discovery Challenge 2010 on Web Quality. It represents a large collection of annotated Web hosts that were labeled by a few institutions, including the Hungarian Academy of Sciences, the Internet Memory Foundation, and L3S Hannover. The dataset contains 23 million pages on 190,000 hosts in the .eu domain. As a first step, the authors’ biclustering method uses a variation of Dhillon’s information co-clustering algorithm with Jensen-Shannon divergence. This is a bidirectional algorithm that clusters Web hosts and words at the same time. Standard evaluation metrics, such as the area under the ROC curve or the normalized discounted cumulative gain (nDCG), are employed. The results show an improvement of 3 to 10 percent over the best previous nDCG results for neutrality, bias, and trustworthiness.

To conclude, the methods presented in this paper help improve the accuracy of classifying Web hosts for neutrality, bias, and trust. This paper is of interest to experts who analyze Web host credibility and how much users can trust these sources of content.

Reviewer:  George Popescu Review #: CR140197 (1210-1057)
Bookmark and Share
  Reviewer Selected
 
 
General (H.3.0 )
 
 
Document Analysis (I.7.5 ... )
 
 
General (I.2.0 )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Dictionary of information science and technology
Watters C., Academic Press Prof., Inc., San Diego, CA, 1992. Type: Book (9780127385105)
Jul 1 1993
Information retrieval
Frakes W., Baeza-Yates R., Prentice-Hall, Inc., Upper Saddle River, NJ, 1992. Type: Book (9780134638379)
Jul 1 1993
Organizing information: principles of data base and retrieval systems
Soergel D., Academic Press Prof., Inc., San Diego, CA, 1985. Type: Book (9789780126542608)
Aug 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy