Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Arabic authorship attribution: an extensive study on Twitter posts
Altakrori M., Iqbal F., Fung B., Ding S., Tubaishat A. ACM Transactions on Asian and Low-Resource Language Information Processing18 (1):1-51,2019.Type:Article
Date Reviewed: Mar 25 2019

Altakrori et al. present an excellent study on Arabic authorship attributes from Twitter posts. The coverage of the subject is very extensive. The authors review a large body of related authorship attribution literature in short messages. Non-Arabic short text authorship studies reviewed include Internet relay chat (IRC), short message service (SMS), and Twitter posts. Only about ten Arabic authorship studies exist in the literature, none of which focus on short text authorship. The authors’ study is among the first to investigate authorship attributes for short Arabic texts.

In addition to an extensive literature review, the authors include a detailed description of the various models and tools used in the literature, as well as those used by the authors themselves. Three categories of features are used in the study: lexical, structural, and syntactic. Lexical features include character-based features such as total character count (M), ratio of letters to M, and ratio of Arabic alphabet (28 features) to M, among others, and word-based features such as word count and average word length. Structural features include measures such as sentence count and average sentence length. While the lexical and structural features exist in any language, the syntax features are language dependent. The authors use a set of Arabic-specific features such as diacritics, punctuation, and function words in their study.

The authors test and evaluate a wide range of models, including naive Bayes, support vector machines (SVMs), decision trees, and random forests. The authors provide detailed descriptions, which researchers will find invaluable for future similar studies. The dataset contains 115768 tweets in Arabic from 155 users. The authors ran all four models with the dataset. The results are used to answer four research questions:

(1) “How does the n-gram approach perform compared to state-of-art instance-based classification techniques under varying attribution scenarios?”
(2) “Which n-gram level (character, word, or syntactic) is the most helpful in distinguishing the authors’ writing styles?”
(3) “How important are diacritics to the attribution process when the n-gram approach is used?”
(4) “When using instance-based classification techniques, how important is it to use all three categories of stylometric features?”

The paper provides a comprehensive literature review of authorship attribution in both English and non-English short texts, particularly in tweets. The authors also discuss four models and how to use them. The information presented in the paper is invaluable for those who work in related fields.

Reviewer:  Xiannong Meng Review #: CR146490 (1906-0244)
Bookmark and Share
  Reviewer Selected
Featured Reviewer
 
 
Content Analysis And Indexing (H.3.1 )
 
 
Knowledge Acquisition (I.2.6 ... )
 
 
Language Acquisition (I.2.6 ... )
 
 
Social Networking (H.3.4 ... )
 
 
Learning (I.2.6 )
 
Would you recommend this review?
yes
no
Other reviews under "Content Analysis And Indexing": Date
Personal bibliographic indexes and their computerisation
Heeks R., Taylor Graham Publishing, London, UK, 1986. Type: Book (9789780947568115)
Sep 1 1987
Development of a term association interface for browsing bibliographic data bases based on end users’ word associations
Pejtersen A., Olsen S., Zunde P., Taylor Graham Publishing, London, UK, 1987. Type: Book (9780947568306)
Nov 1 1989
Transforming text into hypertext for a compact disc encyclopedia
Glushko R. ACM SIGCHI Bulletin 20(SI): 293-298, 1989. Type: Article
May 1 1990
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy