Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Exploiting discourse information to identify paraphrases
Bach N., Minh N., Shimazu A. Expert Systems with Applications: An International Journal41 (6):2832-2841,2014.Type:Article
Date Reviewed: Mar 25 2015

An important task in many natural language applications is to determine whether two sentences have approximately the same meaning, even if they look syntactically different and share only a few words. The problem is usually attacked by machine learning techniques, that is, by training a statistical classifier that computes the similarity of two sentences based on words, parse trees, or other elements.

The authors propose for comparison elementary discourse units (EDUs), which are the basic blocks of a sentence that carry its meaning. To compare two sentences, they are first segmented into EDUs by a discourse segmenter, which chooses from a set of candidates generated by a base segmenter the segmentation that ranks highest according to a score function, which is determined by another classification algorithm. The sentences are then compared by finding for each EDU in one sentence the most similar EDU in the other sentence and averaging their similarities, weighted with the length of the EDU.

A good deal of the paper is dedicated to the experimental evaluation of the new method in comparison to previously reported methods on a large text corpus created for plagiarism detection; it is shown that the accuracy of paraphrase identification can be raised from 92.2 percent to 93.4 percent. It is interesting to note that the technique is very general and can be applied to any kind of text; further work will aim to improve the technique by not only considering the length of EDUs, but also the semantic roles they play in the text.

Reviewer:  Wolfgang Schreiner Review #: CR143276 (1506-0521)
Bookmark and Share
  Featured Reviewer  
 
Document Analysis (I.7.5 ... )
 
 
Feature Evaluation And Selection (I.5.2 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
Would you recommend this review?
yes
no
Other reviews under "Document Analysis": Date
Generating indicative-informative summaries with sumUM: a 3D dynamic virtual shop
Saggion H., Lapalme G. Computational Linguistics 28(4): 497-526, 2002. Type: Article
Jun 20 2003
Parameter-Free Geometric Document Layout Analysis
Lee S., Ryu D. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(11): 1240-1256, 2001. Type: Article
Jul 26 2002
A hierarchical neural network document classifier with linguistic feature selection
Chen C., Lee H., Hwang C. Applied Intelligence 23(3): 277-294, 2005. Type: Article
Aug 2 2006
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy