Nie et al. present an algorithm for ranking answer candidates from all of the available answers for a new question in a community question-answering (QA) system. One simple approach is to select the most similar question and, among the answers to that question, select the highest rated (based on user feedback) answer. Instead, the authors present an approach that has two components: offline learning and online ranking. In the offline mode, a model is learned using features of training data consisting of historical questions and answers. In particular, the authors consider the answers to the same question as well as answers to different questions. By using the rankings of answers from a question and those from different questions, a model is learned. In the online process, top-k similar questions are extracted for a given question. All of the answers to those questions are ranked using the learned model.
The paper is very well written. It covers all aspects of description: problem statement, offline learning, online algorithm, feature selection, comparative approaches, difference with comparative approaches, and so on. Specifically, four types of features are used: deep features (like Doc2vec--for getting context vectors of questions), topic-level features (for example, linear discriminant analysis, LDA), statistical features (number of terms, verbs, nouns, and so on), and user-centric features (user profiles).
The performance results are thorough. The authors use P@K (percentage of correct results in top-k ranked results) as the performance metric. The authors explain all of the comparative techniques. In fact, that is one of the important contributions of the paper. The authors describe point-wise techniques (where the importance of each answer is used as training data), pair-wise techniques (where the comparative rank of a pair of answers is used as training data), and rank-wise techniques (where the whole ranking is used as training data).
The authors are not very convincing when it comes to why the approach of selecting top answers from the most similar questions will not work. Some experimental results to compare with the approach would have convinced the reader. Similarly, why does the proposed algorithm work better for some datasets compared to others? This is not well explained.