Computing Reviews, the leading online review service for computing literature.

Search

Data-driven answer selection in community QA systems
Nie L., Wei X., Zhang D., Wang X., Gao Z., Yang Y. IEEE Transactions on Knowledge and Data Engineering29 (6):1186-1198,2017.Type:Article

Date Reviewed: Oct 24 2017

Nie et al. present an algorithm for ranking answer candidates from all of the available answers for a new question in a community question-answering (QA) system. One simple approach is to select the most similar question and, among the answers to that question, select the highest rated (based on user feedback) answer. Instead, the authors present an approach that has two components: offline learning and online ranking. In the offline mode, a model is learned using features of training data consisting of historical questions and answers. In particular, the authors consider the answers to the same question as well as answers to different questions. By using the rankings of answers from a question and those from different questions, a model is learned. In the online process, top-k similar questions are extracted for a given question. All of the answers to those questions are ranked using the learned model. The paper is very well written. It covers all aspects of description: problem statement, offline learning, online algorithm, feature selection, comparative approaches, difference with comparative approaches, and so on. Specifically, four types of features are used: deep features (like Doc2vec--for getting context vectors of questions), topic-level features (for example, linear discriminant analysis, LDA), statistical features (number of terms, verbs, nouns, and so on), and user-centric features (user profiles). The performance results are thorough. The authors use P@K (percentage of correct results in top-k ranked results) as the performance metric. The authors explain all of the comparative techniques. In fact, that is one of the important contributions of the paper. The authors describe point-wise techniques (where the importance of each answer is used as training data), pair-wise techniques (where the comparative rank of a pair of answers is used as training data), and rank-wise techniques (where the whole ranking is used as training data). The authors are not very convincing when it comes to why the approach of selecting top answers from the most similar questions will not work. Some experimental results to compare with the approach would have convinced the reader. Similarly, why does the proposed algorithm work better for some datasets compared to others? This is not well explained.

Reviewer: Rajeev Gupta	Review #: CR145606 (1712-0819)

Question-Answering (Fact Retrieval) Systems (H.3.4 ... )

Training, Help, And Documentation (H.5.2 ... )

Learning (I.2.6 )

Would you recommend this review?

yes

Other reviews under "Question-Answering (Fact Retrieval) Systems":	Date

A natural language information retrieval system with extentions towards fuzzy reasoning Bolc L. (ed), Kowalski A., Kozlowska M., Strzalkowski T. International Journal of Man-Machine Studies 23(4): 335-367, 1985. Type: Article	Jun 1 1986

Reasoning about naming systems Bowman M., Debray S., Peterson L. ACM Transactions on Programming Languages and Systems 15(5): 795-825, 1993. Type: Article	Sep 1 1994

Answering regular path queries in expressive description logics via alternating tree-automata Calvanese D., Eiter T., Ortiz M. Information and Computation 23712-55, 2014. Type: Article	Feb 2 2015

Reproduction in whole or in part without permission is prohibited. Copyright 1999-2024 ThinkLoud^®
Terms of Use | Privacy Policy