Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Artificial Intelligence (v. 194): Special Issue on Artificial Intelligence, Wikipedia and Semi-Structured Resources   Artificial Intelligence 2013. Type: Journal
Date Reviewed: Feb 8 2013

Wikipedia is one of the noblest manifestations of the spirit of the Information Age. It is also one of the most astonishing. Who could have predicted in 1985 that in 20 years a significant fraction of the world’s population would have at their fingertips a free, high-quality, multilingual encyclopedia, of a size dwarfing the Britannica? That this encyclopedia was built largely by volunteer labor, with no institutional support and no advertisements, makes this notion more far-fetched than the existence of self-driving cars or voice-activated personal assistants.

Wikipedia has also become a major resource for artificial intelligence (AI) research of all kinds, because it combines a number of very useful features:

  • It is immense and of high quality, and covers multiple topics.
  • Its articles contain explicit presentations of basic information (in contrast to, for example, a corpus of news articles, which almost always assumes this information).
  • It is semi-structured. First, the corpus is divided into articles, each of which deals with one specific concept. Second, a number of different features, such as hyperlinks, infoboxes, and category pages, present information in a form that is more nearly standardized than free text, and therefore much easier for automated systems to interpret. This feature is the focus of the papers that are the subject of this review.
  • It is multilingual. Versions of Wikipedia exist in almost 300 different languages. In fact, Wikipedia often represents one of the largest high-quality online corpora for many languages that are otherwise underrepresented on the web.
  • It was created by a vast number of users with minimal knowledge of computer technology, in contrast to handcrafted ontologies and knowledge bases (such as the CYC project [1]), which are crafted slowly by expensive experts.

Taking advantage of these features, AI researchers have used Wikipedia as a data resource for a wide range of applications, including semantic relatedness, disambiguation, co-reference resolution, metonymy resolution, query expansion, multilingual retrieval, question answering, entity ranking, text categorization, and ontology and knowledge-base construction. There are now standard AI tasks that are defined in terms of Wikipedia, such as wikification (associating terms in a text with the corresponding Wikipedia article) and the automated construction of infoboxes. A number of knowledge bases that were automatically built from Wikipedia, particularly YAGO [2] and DBpedia [3], are now themselves widely used tools.

Medelyan et al.’s paper [4] is an exceptionally comprehensive and well-written survey of work in this area up to 2008, with a bibliography of approximately 150 research papers. Readers looking for an introduction to the subject should certainly start there.

The papers reviewed here constitute an entire volume of Artificial Intelligence devoted to a variety of more recent projects on this subject. As in most collections like this, the papers are uneven in quality and readability. This can make it difficult for a nonspecialist such as myself to extract the big picture of what has been accomplished and what are the overarching issues. Regrettably, some of the major research groups in this area are not represented, including Etzioni’s group at the University of Washington.

Three papers in the collection seemed to me particularly fine. The overview, “Collaboratively built semi-structure content and artificial intelligence: the story so far,” written by the volume’s editors, is an excellent complement to, and update of, Medelyan et al.’s paper [4], including both summaries of the papers in this collection and a general survey of work in the area. This paper also presents a series of “take-home messages,” high-level conclusions that provide guidance for future research. “YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia,” by Hoffart et al., shows how the knowledge base YAGO has been extended with spatial and temporal information derived from Wikipedia’s infoboxes. Particularly noteworthy is the paper’s appendix, which contains a collection of questions posed in English, the translation of each question into the query language of YAGO, and the quality of the result obtained. Finally, “An open-source toolkit for mining Wikipedia,” by Milne and Witten, describes the Wikipedia Miner toolkit, which appears to be a very valuable resource.

Overall, the collection is of very great value to both researchers in the area and readers with a general interest in AI. The editors and authors are to be congratulated on an important contribution to the literature on this promising, highly active area of research.

Reviewer:  Ernest Davis Review #: CR140920 (1305-0413)
1) Lenat, D.; Prakash, M.; Shepherd, M. CYC: using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecks. AI Magazine 6, 4(1986), 65–85.
2) Suchanek, F. M.; Kasneci, G.; Weikum, G. YAGO: a core of semantic knowledge. In Proc. of the 16th International World Wide Web Conference ACM, 2007, 697–706.
3) Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. DBpedia: a nucleus for a web of open data. In Proc. of the 6th International Semantic Web Conference Springer, 2007, 722–735.
4) Medelyan, O.; Milne, D.; Legg, C.; Witten, I. H. Mining meaning from Wikipedia. International Journal of Human-Computer Studies 67, 9(2009), 716–754.
Bookmark and Share
  Editor Recommended
Featured Reviewer
 
 
Artificial Intelligence (I.2 )
 
Would you recommend this review?
yes
no
Other reviews under "Artificial Intelligence": Date
Concepts of soft computing: fuzzy and ANN with programming
Chakraverty S., Sahoo D., Mahato N.,  Springer International Publishing, New York, NY, 2019. 195 pp. Type: Book
Apr 29 2020
Evolution or revolution: the critical need in genetic algorithm based testing
Surendran A., Samuel P.  Artificial Intelligence Review 48(3): 349-395, 2017. Type: Article
Dec 1 2017
Robust planning with incomplete domain models
Nguyen T., Sreedharan S., Kambhampati S.  Artificial Intelligence 245 134-161, 2017. Type: Article
Sep 15 2017
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright © 2000-2021 ThinkLoud, Inc.
Terms of Use
| Privacy Policy