Sentiment analysis continues to be successfully applied to consumer reviews. In other areas, the challenges involved can prove insurmountable. A failed attempt to successfully apply sentiment analysis is reported in this negative results paper.
The investigators designed a system that would have been capable of recommending software libraries based on text extracted from Stack Overflow discussions. They adopted a “state-of-the-art approach based on a recursive neural network,” Stanford CoreNLP, to analyze the extracted text for sentiments. A training set involving the manual labeling of sentiments required some 90 hours of work to build. Despite this best practice effort, as the final row of table 2 indicates, overall “precision and recall in detecting positive and negative sentiments [was less than] 40 percent.”
As a follow up, the investigators evaluated five sentiment analysis tools on three datasets (Stack Overflow discussions, app reviews, and JIRA issues). Results for the Stack Overflow discussions dataset were, as before, not acceptable. For the app reviews dataset, however, results were acceptable. This success was attributed to the fact that the reviews were similar to consumer reviews in which opinions are clearly expressed. Results for the JIRA issues dataset were also acceptable, but the interpretation is complicated by the fact that there are no neutral sentences in this particular dataset. The investigators come to the inevitable conclusion that opinion mining a dataset comprising developer discussions of technicalities is obviously a very difficult challenge.
This paper provides several useful insights and is strongly recommended to those working on sentiment analysis.