Computing Reviews

Is Wikipedia link structure different?
Kamps J., Koolen M.  WSDM 2009 (Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, Barcelona, Spain, Feb 9-12, 2009)232-241,2009.Type:Proceedings
Date Reviewed: 07/09/10

Link structure plays an important role in the Web retrieval of relevant information. Characteristics of link structure are useful in measuring the relevance of a page. A page in the page pool has three parameters that determine its relevance: an indegree (the number of pages linked to it), an outdegree (the number of pages linked from it), and length (the number of characters on the page).

In this paper, Kamps and Koolen analyze Wikipedia’s link structure, in order to determine whether Wikipedia link structure is different. They use the 2006-2007 INitiative for the Evaluation of XML-retrieval (INEX) test collection for Wikipedia, and the 2004 Text Retrieval Conference (TREC) for general Web retrieval. Their comparison reveals some interesting observations that go beyond the expected goal, which impacts the effectiveness of information retrieval of related documents.

The paper consists of eight sections, but this review only covers the two most important sections, Sections 3 and 4. Section 3 analyzes the link structures of Wikipedia and .GOV by comparing the indegree, outdegree, and length distribution of pages. The study suggests that Wikipedia has a very densely linked structure. A major portion of the Wikipedia collection consists of strongly connected components (SCC), whereas .GOV collections have a lower SCC value. As it is well organized and author guided, Wikipedia’s dynamic nature makes it a complete link structure, further enhanced by peer editing and automatic link detection. Kamps and Koolen also show that, in Wikipedia, the outdegree and indegree act similarly, but this is not so in the .GOV collection. In Wikipedia, a document’s high values of both outdegree and indegree indicate a high probability of relevance, whereas in .GOV only a high indegree indicates a higher probability of relevance. The length of a document is not related to its probability of relevance in .GOV, but in Wikipedia, the length is directly proportional to the probability of relevance. The authors study the correlations between indegree, outdegree, and length attributes, and show that in Wikipedia, outdegree and length are highly correlated.

Section 4 derives a method to incorporate link evidence of a document: the score of a document d, given a query q, is represented as P(d|q) = P(dP(q|d), where P(q|d) is the chance of deriving q from d, and P(d) is the document prior. The authors also use global and local link evidence as standard degree prior, log degree prior, and prior combination of global and local evidences. The rest of the paper discusses the experiment details regarding the data collection.

In conclusion, the paper presents a deep discussion of Kamps and Koolen’s results. It is a highly relevant paper for readers interested in the assessment methods of link structures for Wikipedia and other Web sources.

Reviewers:  Srini Ramaswamy, M. Venkata Swamy Review #: CR138157 (1106-0651)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy