Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Is Wikipedia link structure different?
Kamps J., Koolen M.  WSDM 2009 (Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, Barcelona, Spain, Feb 9-12, 2009)232-241.2009.Type:Proceedings
Date Reviewed: Jul 9 2010

Link structure plays an important role in the Web retrieval of relevant information. Characteristics of link structure are useful in measuring the relevance of a page. A page in the page pool has three parameters that determine its relevance: an indegree (the number of pages linked to it), an outdegree (the number of pages linked from it), and length (the number of characters on the page).

In this paper, Kamps and Koolen analyze Wikipedia’s link structure, in order to determine whether Wikipedia link structure is different. They use the 2006-2007 INitiative for the Evaluation of XML-retrieval (INEX) test collection for Wikipedia, and the 2004 Text Retrieval Conference (TREC) for general Web retrieval. Their comparison reveals some interesting observations that go beyond the expected goal, which impacts the effectiveness of information retrieval of related documents.

The paper consists of eight sections, but this review only covers the two most important sections, Sections 3 and 4. Section 3 analyzes the link structures of Wikipedia and .GOV by comparing the indegree, outdegree, and length distribution of pages. The study suggests that Wikipedia has a very densely linked structure. A major portion of the Wikipedia collection consists of strongly connected components (SCC), whereas .GOV collections have a lower SCC value. As it is well organized and author guided, Wikipedia’s dynamic nature makes it a complete link structure, further enhanced by peer editing and automatic link detection. Kamps and Koolen also show that, in Wikipedia, the outdegree and indegree act similarly, but this is not so in the .GOV collection. In Wikipedia, a document’s high values of both outdegree and indegree indicate a high probability of relevance, whereas in .GOV only a high indegree indicates a higher probability of relevance. The length of a document is not related to its probability of relevance in .GOV, but in Wikipedia, the length is directly proportional to the probability of relevance. The authors study the correlations between indegree, outdegree, and length attributes, and show that in Wikipedia, outdegree and length are highly correlated.

Section 4 derives a method to incorporate link evidence of a document: the score of a document d, given a query q, is represented as P(d|q) = P(dP(q|d), where P(q|d) is the chance of deriving q from d, and P(d) is the document prior. The authors also use global and local link evidence as standard degree prior, log degree prior, and prior combination of global and local evidences. The rest of the paper discusses the experiment details regarding the data collection.

In conclusion, the paper presents a deep discussion of Kamps and Koolen’s results. It is a highly relevant paper for readers interested in the assessment methods of link structures for Wikipedia and other Web sources.

Reviewers:  Srini RamaswamyM. Venkata Swamy Review #: CR138157 (1106-0651)
Bookmark and Share
  Featured Reviewer  
 
Relevance Feedback (H.3.3 ... )
 
 
Performance Evaluation (Efficiency And Effectiveness) (H.3.4 ... )
 
 
Retrieval Models (H.3.3 ... )
 
 
Information Search And Retrieval (H.3.3 )
 
 
Systems And Software (H.3.4 )
 
Would you recommend this review?
yes
no
Other reviews under "Relevance Feedback": Date
Finding statistics online
Berinstein P., Bjørner S., Information Today, Inc., Medford, NJ, 1998. Type: Book (9780910965255)
Nov 1 1998
The effect of pool depth on system evaluation in TREC
Keenan S., Smeaton A., Keogh G. Journal of the American Society for Information Science 52(7): 570-574, 2001. Type: Article
Sep 1 2001
Regions and levels: measuring and mapping users’ relevance judgments
Spink A., Greisdorf H. Journal of the American Society for Information Science 52(2): 161-173, 2001. Type: Article
Dec 1 2001
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy