Computing Reviews

Exploiting discourse information to identify paraphrases
Bach N., Minh N., Shimazu A. Expert Systems with Applications: An International Journal41(6):2832-2841,2014.Type:Article
Date Reviewed: 03/25/15

An important task in many natural language applications is to determine whether two sentences have approximately the same meaning, even if they look syntactically different and share only a few words. The problem is usually attacked by machine learning techniques, that is, by training a statistical classifier that computes the similarity of two sentences based on words, parse trees, or other elements.

The authors propose for comparison elementary discourse units (EDUs), which are the basic blocks of a sentence that carry its meaning. To compare two sentences, they are first segmented into EDUs by a discourse segmenter, which chooses from a set of candidates generated by a base segmenter the segmentation that ranks highest according to a score function, which is determined by another classification algorithm. The sentences are then compared by finding for each EDU in one sentence the most similar EDU in the other sentence and averaging their similarities, weighted with the length of the EDU.

A good deal of the paper is dedicated to the experimental evaluation of the new method in comparison to previously reported methods on a large text corpus created for plagiarism detection; it is shown that the accuracy of paraphrase identification can be raised from 92.2 percent to 93.4 percent. It is interesting to note that the technique is very general and can be applied to any kind of text; further work will aim to improve the technique by not only considering the length of EDUs, but also the semantic roles they play in the text.

Reviewer:  Wolfgang Schreiner Review #: CR143276 (1506-0521)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy