Computing Reviews

A language modeling approach for extracting translation knowledge from comparable corpora
Rahimi R., Shakery A.  ECIR 2013 (Proceedings of the 35th European Conference on Advances in Information Retrieval, Moscow, Russia, Mar 24-27, 2013)606-617,2013.Type:Proceedings
Date Reviewed: 11/11/13

In machine translation, system efficiency refers to translation quality, which largely depends on factors such as source, to target language similarity; language resources; and the translation model, which is an algorithm that processes source language samples into the target language. Many machine translation systems employ language models that assign a probability to a sequence of a well-defined number of words, using a probability distribution. In this paper, the authors propose an approach that uses comparable corpora for extracting knowledge about a Persian language processed for translation purposes from English. For comparison of the corpora in two languages, the documents are aligned and then similarity scores are computed. The authors follow the Kullback-Leibler divergence model; in ambiguous cases, they apply a naïve Bayes rule. The estimation of prior probabilities relies on the Jelinek-Mercer method and Dirichlet prior smoothing. The alignment similarity is then normalized.

As a result, this work implies that even a relatively poor resource, such as comparable corpora, can be explored efficiently for knowledge extraction. Language knowledge is clearly a valuable foundation for translation purposes.

This topic is interesting, but the presentation is somewhat vague: the paper lacks some precise information, such as how translation quality is measured; the weighting in the models is intuitive rather than formal; and the word translations are statistically independent, which impedes vocabulary coverage, especially when it comes to the translation of technical terms. However, my overall opinion is positive, and I recommend the paper to students working in the field, as well as those who plan to do so.

Reviewer:  Jolanta Mizera-Pietraszko Review #: CR141715 (1401-0100)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy