A technical account with local importance, this paper presents a sentiment classification model that makes use of the rich inflectional structure of the Urdu language. The purpose of the proposed research is to make use of the description and the formalization of Urdu language in order to obtain the optimal basis for running a sentiment analysis algorithm on top of it.
The paper outlines at length the morphological system of Urdu, and argues the benefits of identifying sentiment phrases rather than sentiment keywords only in the sentiment detection process. A sentiment lexicon is constructed that carries information about the polarity and the intensity of the sentiment of each word. This lexicon is employed along with a dependency parser in order to associate SentiUnits with syntactic chunks and rank them. Consequently, a classifier computes the sentiment of the textual unit (a sentence or a sequence of sentences) based on the sum of all polarities of the identified SentiUnits.
The paper includes a detailed state-of-the-art analysis and reports on evaluation results of the proposed approach that score in the range of 82.5 percent of accuracy, based on two experiments.
With a detailed account of the Urdu language, a descriptive state of the art of sentiment analysis approaches, and a straightforward presentation of the proposed approach, this paper is valuable for linguists, computational linguists, and scholars interested in the processing of languages with specific particularities, such as rich inflectional systems and peculiar alphabets, as well as in advances in sentiment analysis.