Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Improving the extraction of bilingual terminology from Wikipedia
Erdmann M., Nakayama K., Hara T., Nishio S. ACM Transactions on Multimedia Computing, Communications, and Applications5 (4):1-17,2009.Type:Article
Date Reviewed: Mar 12 2010

Wikipedia is presumably the most popular encyclopedia on the Net. It is maintained by users who are not necessarily specialists in the respective fields, so the quality of the knowledge presented is not always reliable. On the other hand, as natural languages have the tendency to change with the passing of time, the Wikipedia articles include expressions that are still current. From this perspective, its multilingualism is of a special kind--each article has a different set of articles in other languages that are not necessarily their translations. The advantage is that a user who speaks foreign languages has a chance to read the same article in many languages and study new details of the subject for each language version. This substantiates some interest in exploiting Wikipedia for multilingual purposes.

This paper’s stem-based approach relies on the assumption that two term translations of a Wikipedia title link to their parallel article. Erdmann et al. aim to complement automatically, rather than create a bilingual dictionary. According to the authors, the translation accuracy measured increases with the added redirect page titles, with the anchor text information, and when it is supplemented by the translation candidates extracted from forward and backward links of the article (although an incoming link for one user can be the outgoing link for another). Next, the support vector machine (SVM) classifier filters the total number of translation candidates.

Although the paper is very interesting, it has some minor flaws. For instance, in Section 2.2, “Automatic Dictionary Construction,” “Wikipedia” is a subsection title--as are “Parallel Corpora” and “Comparable Corpora”--even though Wikipedia is not a dictionary, but rather an encyclopedia of articles whose structures, organization, and content depend on the author. Also, for this particular experiment, the language pair of English and German is only mentioned on page 10, as if the language pair phenomenon is of no importance to the process of creating an automatic dictionary. Some of the results, such as comparing an SVM classifier based on two different features to an SVM classifier based on 13 different features, or the two languages’ common derivations, are too obvious to include. Nonetheless, the paper is really intriguing, and the results can be used by other researchers and ported to other language pairs.

Reviewer:  Jolanta Mizera-Pietraszko Review #: CR137798 (1007-0724)
Bookmark and Share
  Featured Reviewer  
 
Linguistic Processing (H.3.1 ... )
 
 
Dictionaries (H.3.1 ... )
 
 
Statistical Databases (H.2.8 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Database Applications (H.2.8 )
 
 
Document And Text Processing (I.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Linguistic Processing": Date
Anatomy of a text analysis package
Reed A. Information Systems 9(2): 89-96, 1984. Type: Article
Jun 1 1985
Dependency parsing for information retrieval
Metzler D., Noreault T., Richey L., Heidorn B.  Research and development in information retrieval (, King’s College, Cambridge,3241984. Type: Proceedings
Oct 1 1985
Automated medical office records
Gabrieli E. Journal of Medical Systems 11(1): 59-68, 1987. Type: Article
Nov 1 1988
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy