Computing Reviews

Applying Authorship Analysis to Extremist-Group Web Forum Messages
Abbasi A., Chen H. (ed) IEEE Intelligent Systems & Their Applications20(5):67-75,2005.Type:Article
Date Reviewed: 02/27/06

The question of whether an author leaves an unconscious but statistically discernable “signature” on his or her writing was first visited by Wake at Oxford in 1911. Wake was an eminent classicist, but he was not a statistician, and his sentence length statistics did not prove useful. In the 1960s, a Church of England minister and New Testament scholar, A.Q. Morton, who was a statistician, developed a statistical authorship test for Greek, and used it successfully on the Pauline Epistles, the Gospel of Luke, and the Acts of the Apostles. He and others later used it on Homer’s Iliad, also with notable success. The test was very simple, but useful for Greek text; he simply counted the number of times kai was used in each sentence. Kai is a coordinating conjunction in Greek 95 percent of the time (it is an adverb the other five percent), and performs the combined roles of all the coordinating conjun!ctions in English (and, or, but, and so on). Alvar Ellegard developed a much more sophisticated statistical method [1] for his doctoral dissertation at Uppsala, and used it to prove that Sir Philip Francis, a British civil servant, had written the scathing Junius Letters to the London Public Advertiser criticizing King George III and his war against the American colonies. Junius Brutus killed Julius Caesar, but George III would certainly have hanged this Junius, Philip Francis, for sedition if he knew he was the author of the letters.

This fascinating paper takes the unconscious authorship signature problem into new theoretical (but also very practical) realms. The paper presents new methods that go beyond Greek and English literary texts to the analysis of extremist multi-language polemics on Internet Web sites. This extension of the technology opens up new vistas. For example, Internet Web sites are a very new literary genre, and the Arabic language, with its 5,000 roots or stems, is very highly inflected. Arabic has 15 verbal conjugations, compared to Hebrew with only eight, and Indo-European languages with even fewer. The liaison issues in Arabic, which is only written cursively, and which has initial, medial, and final forms for many letters, along with infixes and consonant stacking, add to the morphological, grammatical, and syntactical interface of the language. The authors find that this craggy linguistic interface, while complex, does add some statistical hand and toe holds. Their methods sho!w significant discriminating power in the application of authorship identification techniques to both English and Arabic messages. Ku Klux Klan (KKK) polemics were used as a sort of English language control in the development of the methods.

This well-presented, well-written paper illustrates an important and very current application of computer-based statistical methods for authorship identification. It is so good, and so relevant to our times, that I am surprised it wasn’t classified by the US National Security Agency (NSA).


1)

Ellegard, A. Who was Junius?. Almqvist and Wiksell, Stockholm, 1962.

Reviewer:  P. C. Patton Review #: CR132494 (0611-1170)

Reproduction in whole or in part without permission is prohibited.   Copyright 2024 ComputingReviews.com™
Terms of Use
| Privacy Policy