Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Learning to rank answers to non-factoid questions from Web collections
Surdeanu M., Ciaramita M., Zaragoza H. Computational Linguistics37 (2):351-383,2011.Type:Article
Date Reviewed: Jan 19 2012

Typical question-answering (QA) system answers to factoid questions are usually no longer than two words. However, to be matched correctly to relevant digital documents on the Web, the answers must be very precise--they must contain the same query words as the question, or at least their closest synonyms, and follow the structure (syntax) of the question. Factoid questions usually refer to a date, person, place, organization, number, or, for example, a structural element name. On the other hand, non-factoid questions tend to be more complicated, and the relationship between the users’ query words and the text in the digital documents tends to be more complex, as several types of mismatches are allowed. Thus, answering non-factoid questions is harder than answering factoid ones.

The authors attempt to prove that a combination of linguistic features based on social QA sites and the use of syntactic parsing--called entity recognition and semantic labeling--produces an increase in non-factoid QA of 14 to 21 percent in both mean reciprocal rank and Precision@1, depending on the system efficiency. The research framework contains features of linguistic similarity that are measured with the Okapi BM25 ranking function, based on bag-of-words, in terms of probability of their appearance in the text regardless of their alignment. Translation statistical models, like the IBM Word Alignment Model 1 (one of the five hidden Markov models), which relies on the probability distribution of any target language string being a best translation of a source language string, are also explored. For answer ranking, a probability density function and the word frequency are used.

The research results show that the aggregation techniques that exploit the linguistic features selected produce the best answers to non-factoid questions. Although the experiment described was carried out for the Text Retrieval Conference (TREC) and Cross-Language Evaluation Forum (CLEF) Workshop and the tools utilized are rather conventional, readers, especially those working in the field of computational linguistics, may find it intriguing simply because it is quite well written. The framework is concise, the highlights are given at the very beginning, the technical terminology is defined, and every stage is supported by examples, formulas, and diagrams, making the experiment very clear for both students and researchers who can use it as a basis for their own experiments. Taking this into consideration, I would recommend the paper to those in academia.

Reviewer:  Jolanta Mizera-Pietraszko Review #: CR139787 (1206-0612)
Bookmark and Share
  Featured Reviewer  
 
Linguistic Processing (H.3.1 ... )
 
 
Linguistics (J.5 ... )
 
 
Content Analysis And Indexing (H.3.1 )
 
 
General (H.3.0 )
 
 
Natural Language Processing (I.2.7 )
 
Would you recommend this review?
yes
no
Other reviews under "Linguistic Processing": Date
Anatomy of a text analysis package
Reed A. Information Systems 9(2): 89-96, 1984. Type: Article
Jun 1 1985
Dependency parsing for information retrieval
Metzler D., Noreault T., Richey L., Heidorn B.  Research and development in information retrieval (, King’s College, Cambridge,3241984. Type: Proceedings
Oct 1 1985
Automated medical office records
Gabrieli E. Journal of Medical Systems 11(1): 59-68, 1987. Type: Article
Nov 1 1988
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy