Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Survey on mining subjective data on the web
Tsytsarau M., Palpanas T. Data Mining and Knowledge Discovery24 (3):478-514,2012.Type:Article
Date Reviewed: Nov 16 2022

When accessing information on the web, users not only consume the information but also comment on and actively annotate the content, which then generates new content and can also provide ratings. People express themselves on the web through blogs, wikis, forms, and social networks, and give their feedback and opinions on varying topics, including politics, healthcare, products, and travel. Customers seek opinions about products/services from other consumers who have experience with said products/services; useful and influential opinions can be generated via aggregated and accumulated feedback from multiple sources.

This survey paper introduces subjectivity analysis, which includes opinion mining, opinion aggregation, sentiment analysis, and contradiction analysis. Opinion mining uses four approaches: machine learning, dictionary based, statistical, and semantic. This survey covers opinion classification, topic-specific features, sentiment categories/continuous range, target domains, and scale of the algorithm.

Opinions, also called sentiments or emotions, can present as anger, disgust, fear, joy, sadness, or surprise. Opinion mining has two steps: identify topics and classify sentences/documents. Sentiment classification distinguishes between positive, negative, and neutral texts. A sentiment analysis task rates an inference where class labels (often one to five “stars”) represent the polarity of an opinion. Sentiments are aggregated over time and space for a query and can be presented in several dimensions, for example, joy-sadness, acceptance-disgust, anticipation-surprise.

Machine learning methods and annotated datasets have contributed to advancements in opinion mining. Machine learning methods train a model using corpus data; once trained, the model is used to classify the datasets. In fact, a classifier distinguishes among the sentiment labels by analyzing relevant features, which are then used to predict sentiments for new documents.

The machine learning algorithms used are support vector machine (SVM), naive Bayes, and maximum-entropy; SVM performed best. Machine learning is highly sensitive to feature selection, the latter is encoded as present/absent, and the complete document is encoded as a binary vector. The polarity of a sentence/document is determined by averaging the polarity of individual words and sentiments. When there are more than two classes, sentiments are encoded as discrete; continuous form uses scalar values. The latter offers better resolution with finer control, but “is not favored by the classification algorithms.”

Review mining is based on opinion-aggregation obtained by processing, mining, and reasoning on customer feedback data; sentiment polarities are aggregated over frequent features. However, the aggregation has some weak points due to the smoothing of variances in opinions and the manipulating of aggregate values via artificially introduced data, for example, fake reviews. Such manipulation, however, is not well studied yet.

Opinion aggregation may sometimes produce lossy summarization on available opinion data when diversity in the data is ignored. This has necessitated contradiction analysis, that is, the analysis of “features that contribute to a contradiction,” for example, antonymy, negation, and numeric mismatches. Contradiction on a topic may also change over a time, thus time is added as a parameter. Communities in the blogosphere transit between high and low entropy states across time. Entropy also grows when the diversity of opinions grows.

This survey paper provides a concise but thorough introduction to the progress of research in the field of sentiment analysis and opinion mining while sharply distinguishing individual works. It is easy reading--only essential mathematics are included--hence it will be interesting for a large audience. Finally, though the paper is from 2012, the field continues to face many challenges. The application of machine learning in micro-blogging is still new and relevant today.

Reviewer:  K R Chowdhary Review #: CR147514
Bookmark and Share
  Reviewer Selected
Data Mining (H.2.8 ... )
Web-Based Interaction (H.5.3 ... )
Web-Based Services (H.3.5 ... )
Information Storage And Retrieval (H.3 )
Would you recommend this review?
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 2004™
Terms of Use
| Privacy Policy